Not all of them make sense in every scenario. Think of them like a toolbox and use whatever tool seems to make the most sense. For instance: I successfully used the scientific method to figure out why my floor heating did not work correctly after the heating engineer was unable to identify the issue.
Strategy 1: Googling the error message
This approach is the most effective if you have a concrete error message or a behaviour that you can attribute to a certain component or framework.
This often yields good results. Most of the time, you’re not the first person to encounter an error message and often you’ll find the solution straight away. If not, you’ve most likely gained insight about what to look for next.
If you have a problem that is specific to your system or there’s no specific description of the problem you can google for, this approach won’t work well. If that’s the case, the divide and conquer strategy can help you.
Strategy 2: Divide and conquer
The idea behind this approach is to locate the issue quickly, by repeatedly halving the problem area you’ll find the location where the problem occurs. It’s a very generic approach that can be used in many circumstances and is often the first thing I do, If I don’t already have a specific error message.
Let me explain the approach with an example from customer service:
“Customers are reporting that ordering a product doesn’t work on production.”
When applying the divide and conquer method, we do the following:
- Check the developer console to see if the request goes through.
- If the request is sent and the sent data is correct, it’s probably a backend problem. An error code might confirm this direction.
- If the request is not sent or the data is not correct, it’s probably a frontend problem.
At this point we have halved the area where the problem occurred. We keep repeating this process by checking the next place in the middle of our problem area.
If it’s a frontend problem: Either check the stage where the request is sent or the request data is built. To check you can either use console.log, the debugger or whatever else gives you an answer quickest.
If it's a backend problem: Check the logs of the server for any error messages when the request was made. These can often tell you if it’s an infrastructural issue or an issue of the code itself. Seeing no logs at all, is also a good indicator that the issue might lie between the frontend and the backend, caused by a reverse proxy or a web application firewall.
It’s an effective strategy to test at these junction points as the test tells you which direction to go next. Since we’re halving the area where the error can occur with every test, we can find the culprit of the error very quickly.
Strategy 3: Scientific method
Sometimes it’s very hard to pinpoint where a bug is coming from. This is when the scientific method can come in handy. I only use it when the other approaches don't work as it takes more time and is not as likely to succeed. It works the following way:
Step 1: Gather all the information you have about the occurring problem
- Characteristics of the problem: How often does it occur? Does it occur for everybody?
- Which systems are acting together in this scenario?
- Do we see anything in the logs?
Step 2: Theorise a potential cause
Think about potential issues that could cause such behaviour. Be creative and think a bit outside of the box but don’t go too far off board. It still needs to be realistic.
Think about the boundaries of the different systems. Which systems are involved? Which system could cause such an issue? Which ones can you already rule out?
When you come up with an idea for a potential cause, form a hypothesis.
As an example: We often had downtimes appear on our monitoring but never could experience the downtime ourselves. We were using a third party software to handle redirects in front of an application. Because our monitoring system was geographically distributed, we formed the following hypothesis:
The issue is only occurring in one region of the vendor's service.
The hypothesis explained why we didn’t experience the problem but the monitoring did. After confronting the vendor with our hypothesis, they quickly checked the logs of their north american region and realised that we were indeed right.
Step 3: Test your hypothesis
As we did in our previous example, it’s important to test all the information you already have about the bug against your hypothesis. Does the theory explain all the behaviours that are occurring? If not, your theory is probably wrong and you need to come up with another one that explains it all. It’s also possible that you’re dealing with 2 problems at once but it’s less likely. It’s generally a good idea to follow Occam's Razor here which states to follow the hypothesis with the fewest assumptions.
Another effective approach to test your hypothesis is trying to disprove it.
As an example: if you think the storage is full, you shouldn’t be able to execute anything that is writing to the disk. If that still works, your hypothesis is wrong and you need to come up with another one.
When applying the scientific method, you will either run out of potential hypotheses or find one that passes all tests and potentially directs you to the root of the problem.
When you’re not able to find a solution that way, a good next step is the Rubber Duck Method.
Strategy 4: Rubber duck debugging
The idea of this approach is to explain your problem to someone else that is not familiar with it. This can either be a coworker or a rubber duck that you keep on your desk for exactly this purpose.
Explaining it to someone else forces you to take a step back and look at the problem from a greater distance.
This process often causes you to locate the bug while you’re explaining it to someone else. Since this works also by explaining it to a rubber duck, it’s called the rubber duck method.
Strategy 5: Stepping away from the problem
Sometimes you’ll get stuck figuring out an issue and are not able to solve it.
In these cases it’s often best to step away from the problem and do something else, you might as well just sleep about it.
Often you’ll be able to solve it right away, once you come back because you look at it with a fresh perspective. This works because sometimes we are too deep into the problem and become blind to obvious errors. Sometimes your subconscious will even solve it for you and you’ll have a eureka moment where you just randomly figure out the problem while doing something else. In the past, this has worked so well for me, that I sometimes delay investigating non urgent issues on purpose, so I can solve them in less time.
So what happened to my floor heating problem?
By now you’re equipped with 5 different strategies you can use to find and extinguish bugs in your code and the rest of your life.
Now for those curious about the story with my floor heating: The room of my partner always remained cold, even though we fully opened the dial in that room. The rest of the house seemed fine. The technician couldn’t find a fault in the system and we optimised the flow rates together according to the plan but we couldn’t get it to work properly.
After looking at the plan and stepping away from the problem (strategy 5), I formed a hypothesis (strategy 3). On the plan it was visible that the lines in the floor went a rather weird route to the room. My hypothesis was that the person who laid out the pipes for the heating, did not follow the architect's plan and went with a more direct route, without telling anyone.
This in turn meant, that the heating controllers were paired to the wrong rooms. This hypothesis made sense because the room that was being heated was not the one the sensor was monitoring. Therefore it would just heat the wrong room all the time without getting any warmer in the correct room.
I then tested my hypothesis by switching the heating controllers under the assumption that all the pipes were going to the rooms as directly as possible. Turned out I was right and the heating started working correctly.
Have you ever dealt with bug fixing and which strategy did you choose? I’m curious to hear your approach.