N10-007 Given a scenario, implement the following network troubleshooting methodology

Identify the problem

Establish a theory of probable cause

The first step in the troubleshooting process is to establish exactly what the problem is. This stage of the troubleshooting process is all about information gathering, identifying symptoms, questioning users, and determining if anything has changed. To get this information, you need knowledge of the operating system used, good communication skills, and a little patience. You need to get as much information as possible about the problem. You can glean information from three key sources: the computer (in the form of logs and error messages), the computer user experiencing the problem, and your own observation.

After you have listed the symptoms, you can begin to identify some of the potential causes of those symptoms.

Test the theory to determine cause

After questioning the obvious, you need to establish a theory. After you formulate a theory, you should attempt to confirm it. An example could be a theory that users can no longer print because they downloaded new software that changed the print drivers, or that they can no longer run the legacy application they used to run after the latest service pack was installed.

If the theory can be confirmed, then you must plot a course of action—a list of the next steps to take to resolve the problem.

If the theory cannot be confirmed (in the example given, no new software was downloaded and no service pack was applied), you must establish a new theory or consider escalating the problem.

Establish a plan of action to resolve the problem and identify potential effects

After identifying a cause, but before implementing a solution, you should establish a plan for the solution. This is particularly a concern for server systems in which taking the server offline is a difficult and undesirable prospect.

After identifying the cause of a problem on the server, it is absolutely necessary to plan for the solution. The plan must include the details of when the server or network should be taken offline and for how long, what support services are in place, and who will be involved in correcting the problem.

Planning is an important part of the whole troubleshooting process and can involve formal or informal written procedures. Those who do not have experience troubleshooting servers might wonder about all the formality, but this attention to detail ensures the least amount of network or server downtime and the maximum data availability.

With the plan in place, you should be ready to implement a solution—that is, apply the patch, replace the hardware, plug in a cable, or implement some other solution. In an ideal world, your first solution would fix the problem; although, unfortunately this is not always the case. If your first solution does not fix the problem, you need to retrace your steps and start again.

You must attempt only one solution at a time. Trying several solutions at once can make it unclear which one corrected the problem.

Implement the solution or escalate as necessary

After the corrective change has been made to the server, network, or workstation, you must test the results—never assume. This is when you find out if you were right and the remedy you applied actually worked. Don’t forget that first impressions can deceive, and a fix that seems to work on first inspection might not actually have corrected the problem.

The testing process is not always as easy as it sounds. If you are testing a connectivity problem, it is not difficult to ascertain whether your solution was successful. However, changes made to an application or to databases you are unfamiliar with are much more difficult to test. It might be necessary to have people who are familiar with the database or application run the tests with you in attendance.

Verify full system functionality and if applicable implement preventative measures

Sometimes, you might apply a fix that corrects one problem but creates another. Many such circumstances are hard to predict—but not always. For instance, you might add a new network application, but the application requires more bandwidth than your current network infrastructure can support. The result would be that overall network performance would be compromised.

Everything done to a network can have a ripple effect and negatively affect another area of the network. Actions such as adding clients, replacing hubs, and adding applications can all have unforeseen results. It is difficult to always know how the changes you make to a network might affect the network’s functioning.

The safest thing to do is assume that the changes you make will affect the network in some way and realize that you have to figure out how. This is when you might need to think outside the box and try to predict possible outcomes.

It is imperative that you verify full system functionality before you are satisfied with the solution. After you obtain that level of satisfaction, you should look at the problem and ascertain if any preventative measures should be implemented to keep the same problem from occurring again.

Document findings, actions, and outcomes

Although it is often neglected in the troubleshooting process, documentation is as important as any of the other troubleshooting procedures. Documenting a solution involves keeping a record of all the steps taken during the fix—not necessarily just the solution.

For the documentation to be of use to other network administrators in the future, it must include several key pieces of information. When documenting a procedure, you should include the following information:

  • When: When was the solution implemented? You must know the date because if problems occur after your changes, knowing the date of your fix makes it easier to determine whether your changes caused the problems.
  • Why: Although it is obvious when a problem is being fixed why it is being done, a few weeks later, it might become less clear why that solution was needed. Documenting why the fix was made is important because if the same problem appears on another system, you can use this information to reduce the time needed to find the solution.
  • What: The successful fix should be detailed, along with information about any changes to the configuration of the system or network that were made to achieve the fix. Additional information should include version numbers for software patches or firmware, as appropriate.
  • Results: Many administrators choose to include information on both successes and failures. The documentation of failures might prevent you from going down the same road twice, and the documentation of successful solutions can reduce the time it takes to get a system or network up and running.
  • Who: It might be that information is left out of the documentation or someone simply wants to ask a few questions about a solution. In both cases, if the name of the person who made a fix is in the documentation, he or she can easily be tracked down. Of course, this is more of a concern in environments that have a large IT staff or if contractors instead of company employees performs system repairs.