This paper explains the scientific approach to problem solving. Although it is written to address problems related to information technology, the concepts may also be applicable to other disciplines. The methods, concepts, and techniques described here are not new, but what is shocking is the number of “problem solvers” who fail to use them. In between, I’ll include some real-world examples.
Why guess problem-solving instead of taking a scientific approach to problem-solving? Maybe because it looks faster? Perhaps a lack of experience in solving problems efficiently? Or maybe because it seems like hard work to do it scientifically? Maybe while you keep guessing and not really solving, you’re generating more income and adding some job security? Or perhaps because you violate the first principle of problem-solving: understanding the problem.
Principle #1. Understand the real problem.
Isn’t it obvious that you need to understand the problem before you can solve it? maybe. But, most of the time, the analyst will start to solve the problem without knowing the real problem. What the customer or user describes as a “problem” is usually just the symptom! Display “My computer does not want to turn on”. The real problem may be that the entire building is without electricity. Display “Every time I try to add a new product, I get an error message”. Here the real problem could be “Only the last two products I tried to add gave a ‘Product already exists’ error.” Another classic example: “Nothing works”…
You begin the investigation by identifying the “real problem”. This will entail asking (and sometimes checking) questions and doing some basic testing. Ask the user questions like “When was the last time it worked successfully?” “,” How long have you been using the system? , “Is it running on another computer or another user?” , “What exactly is the error message?” etc. Ask for screen printing of the error if possible. Your primary test will be to ensure that the full equipment is running. Check the user’s computer, network, web server, firewall, file server, back-end database, etc. Worst case you can eliminate a lot of areas to cause the problem.
Real example from life. Symptom, per user: “The system hangs at random times when requests are made.” Environment: User enters order details into a form on a mainframe application. When all the details are completed, the user will close the form. The main computer then sends these details via the communication software to the Oracle Client/Server system in the factory. The Oracle system will map the capacity and either return an error or an expected date to the main computer system. This problem is very serious, because you may lose customers if they try to place orders and the system does not accept them! To try to solve this problem, people began to investigate: 1) the load and capacity of the mainframe computers 2) monitor the network load between the main computer and the Oracle system 3) hire consultants to debug the communication software 4) debug the Oracle capacity planning system After spending two months they could not solve the problem.
He was called a “scientific problem solver”. It took less than a day and the problem was resolved! how? The analyst spends the user’s day to see what the “real problem” is. Found that the problem only occurs with export commands. By checking the capture screen and user actions it was found that with export commands the last field in the form is always left blank and the user does not output that field. The system wasn’t hanging, it just waited for the user to press “tab” again. The problem has been resolved. It should be noted that the Scientific Problem Solver has very limited knowledge of mainframe, order capture system, communications software and Oracle Capacity Planning system. This brings us to principle number 2.
Principle #2. Don’t be afraid to start the solution process, even if you don’t understand the system.
How many times have you heard “I can’t touch this code, because it was developed by someone else!” or “I can’t help because I’m an HR consultant and this is a financial problem”? If you don’t want to operate a washing machine, you don’t need to be an electrical engineer, washer repair professional, technician or other professional to do some basic troubleshooting. Make sure the plug is working. Check trip switch, etc. “I’ve never seen this error before” shouldn’t stop you from trying to fix the problem. With the error message and internet search engine, you can get a lot of starting points.
In every complex system there are two basic working principles. System A reading data from System B can be terribly complex (maybe a laboratory spectrometer reading data from a programmable logic computer via an RS-232 port). But, there are a few basics to test: Do both systems have power? Is there an error message in the event log on one of these systems? Can you “ping” or trace a network packet from one system to another? Try a different connection cable. Search the Internet for the error message.
Once you have identified the problem, you need to start solving it. Sometimes an initial investigation will point you directly to a solution (turn on the power; replace faulty cable, etc). But, sometimes the real problem is complex in itself so the next principle is to simply solve it.
Principle #3. Conquering it is simple.
Let’s start this section with a real life example. Under certain circumstances, a stored procedure will be suspended. A stored procedure usually takes about an hour to run (when it’s not hanging). Therefore, the developer tried to correct errors. Make some changes and then wait another hour or so to see if the issue is resolved. After a few days the developer gave up and took over the “problems”. The “problem solver” at his disposal had to know that the stored procedure would break under witchcraft conditions. So, it was a simple exercise to make a copy of the action, and then use that copy to strip all the unnecessary code. All parameters have been changed with encoded values. Pieces of code were executed one at a time and the result sets were then hard-coded back into the action instance. Within 3 hours the problem was resolved. An infinite loop has been detected.
What the “problem solver” did was replicate the problem and at the same time try to isolate the code that caused the problem. In doing so, a complex (and time consuming) stored procedure becomes a quick and simple thing to do.
If the problem is within an app, create a new app and try to simulate the problem within the new app as simple as possible. If the problem occurs when a specific method is called for a specific control, try to include only that control in the empty application and call that method with encoded values. If the problem is with SQL embedded within a C# application, try simulating SQL within a database query tool (such as SQL * Plus for Oracle, Query Analyzer for SQL Server, or use code in MS Excel via ODBC to database).
The moment you can replicate a problem in a simple way, you are more than 80% on your way to solving it.
If you don’t know where the problem is in the program, use DEBUG.
Principle number 4. Correction.
Most application development tools come standard with a debugger. Whether it’s Macromedia Flash, Microsoft Dot Net, Delphi, or any development environment at all there will be some kind of debugger. If the tool does not come with a standard debugger, you can emulate one.
The first thing you want to do with the debugger is locate the problem. You can do this by adding breakpoints in key areas. Then you run the program in debug mode and you will know which breakpoints the problem occurred. Dig down and you will find the spot. Now that you know where the problem is, you can “simply beat” it.
Another great feature of most debugging tools includes the ability to view variables, values, parameters, etc. as you move through the program. With these known values in certain steps, you can code them in the “simplified version” of the program
If the development tool does not support debugging, you can simulate it. Put steps in the program that output variable values and “Hey I’m here” messages either to the screen, to a log file, or to a database table. Remember to take it out when you solve the problem… You don’t want your file system cluttered or full of log files!
Principle #5. There is a wealth of information on the back end of a database that can help solve a problem.
The Problem Solver has been called upon to help solve a very difficult problem. There was a project migrating the system from a mainframe to a client-server technology. All went well during testing, but when the systems booted up, all of a sudden there were quite a few “General Protection Faults” and completely random. (The GPF bug was a common bug trap in Windows 95 and 98). An attempt was made to simplify the code, an attempt was made to debug it, but it was impossible to replicate it. In a LAB environment, the problem will not occur! Debugging trace messages for log files indicated that the problem occurred very randomly. Some users experience it more than others, but eventually all users will get it! Interesting problem.
The problem solver solved this after it started analyzing the back-end database. I’m not sure if it was by accident or because it moved systematically in the right direction due to a scientific approach. By tracking what was happening at the back-end level, he found that all of these applications were creating more and more connections to the database. Every time a user initiates a new transaction, another connection to the database is established. The sum total of connections is only released when the app is closed. As the user navigates to new windows within the same application, more and more connections are opened, and after a set number of connections, the application will have enough and then crash. This was a programming error in a template that was used by all developers. The solution was to first test if the database cursor is already open, before opening it again.
How do you keep track of what is happening in the backend database? Major database providers have graphical user interface (GUI) tools that help you track or analyze the queries being launched against the database. It will also show you when people are connecting, disconnecting, or unable to connect due to security breaches. Most databases also include some system dictionary tables that can be queried for this information. These traces can sometimes tell a whole story as to why something failed. The query code you retrieve from the trace can help “simplify your search”. You can see from the trace whether the program connects to the database successfully. You can see the time it takes to execute the query.
To add to principle #2 (don’t be afraid to start…); You can analyze this tracking information, although you may not know anything about the details of the application.
Remember though that these back-end artifacts can put a strain on back-end resources. Don’t leave it running for an unnecessarily long time.
Principle #6. Use fresh eyes.
This is the last principle. Don’t waste too much time fixing the problem before asking for help. Help doesn’t have to be from someone above you. The principle is that you need a pair of fresh eyes for a new perspective and sometimes a little bit of fresh air by taking a break. The other person will search and then ask a question or two. Sometimes this is something very obvious that is missed. Sometimes just answering a question makes you think in new directions. Also, if you spend hours digging into the same piece of code, it’s very easy to start looking for a silly bug. A lot of financial budget problems are solved by beer. A change of scenery, and/or a relaxed atmosphere may be the solution. Perhaps it was the fresh oxygen that went to the brain while walking to the pub. Maybe because the problem was discussed with someone else.
Conclusion
After reading this paper, the author hopes that you will try this next time you encounter a problem that needs to be solved. We hope that by applying these six principles you will realize the advantages they bring, rather than “guessing” your way to a solution.