[table of contents][main topic of chapter][index][previous][next]
[copyright and trademarks][Preface Overview][Cisco TCP/IP Suite books]

Advanced Network Troubleshooting

Occasionally, you might run into problems that require some ingenuity to diagnose and resolve. Although we cannot anticipate every problem you might run into, this section covers some example problems, and shows you some techniques you might find useful in diagnosing other problems.

This section includes:

Diagnosing a Failing Connection to a Network Resource

You might find that a connection to a network resource such as a printer or workstation is failing even though it had been working fine. For example, you might suddenly find that you can no longer print on your network printer.

This might indicate that another machine on the network has taken over the IP address for the machine that is causing problems.

To check out a bad network connection:

  1. If you can, check the machine to ensure that it is turned on and working properly. If the machine is not working properly, fix it and retry your connection.
  2. Start MultiNet Tools and use Host Lookup to check the DNS listing for the machine's host name. Make note of the IP address. If there is no entry for the machine, it may no longer be on the network. Contact your network administrator.
  3. Ping the machine. If the machine does not respond, then there might be something wrong with the machine. Contact the person responsible for managing the machine.
  4. Use MultiNet Tools TraceRoute to determine where the failure occurs.
  5. If the machine does respond to Ping, TraceRoute does not help, and you are familiar with the hardware network cards and their hardware addresses (also called media access control (MAC) addresses), start Monitor and look at the ARP table.

    Look at the entry for the problem machine. If the hardware address is not within the expected range given the hardware manufacturer for the machine you are expecting at that address, then the IP address is probably being used by a different machine.

    For example, the expected range of addresses for an HP printer is different than the range for a workstation's ethernet card. Your hardware manufacturer can tell you the hardware addresses that they use.

    To restore the connection to the machine, the network administrator must find the machine that has taken over the IP address and change its configuration to use a new, unique IP address.

    If you have a network sniffer, or a UNIX or Cisco MultiNet for OpenVMS system that can run a TCPdump in promiscuous mode, you can also check for ARP problems by watching for ARP replies. You should see only one response for an ARP request. If you see two or more responses, they will be from different hardware addresses. Only one system will have the correct hardware address.

  6. If you still cannot determine the problem, contact the network administrator, who might be able to examine a TCPdump or call Technical Support for help in isolating the problem.

Determining Why You Cannot Connect to a System

Occasionally you might find that you cannot connect to a particular system. Often this is due to heavy network activity on the target machine (for example, a popular Web site). If you know the system is popular, you might simply try reconnecting later.

If your problems connecting to the system are regular, or if the system is critical to you, you can follow these steps to determine if there is another problem besides the system being too busy to respond:

  1. Start MultiNet Tools and use TraceRoute to trace the route between your machine and the target, unresponsive machine. If the trace makes it to the unavailable system, the system may be too busy at this time to respond to your attempted connection, or may be configured improperly. Try your connection again, and if the problem persists, contact your network administrator or the system administrator of the remote system.
  2. If the trace stops before making it to the remote system, then the network connection is broken at the last location. If the location is within your company, or is provided as a service for which your company is paying, call the owner of the resource. (Whois in MultiNet Tools can help you locate a contact if you do not know who to call, or call your help desk or network administrator).
  3. If the trace does not stop, but at some point circles back on itself, look at the trace information to find the point at which the route is circling back. The machine routing the packets back needs to be fixed.

Determining Why You Cannot Connect Outside the Local Network

If you cannot connect outside your local network, you have a routing problem.

Use Monitor to look at the routing table. You should have at least these three destinations:

Determining Why Network Connections Are Dropped

If you find that, when using an application like Telnet or FTP, your connections to the remote system drop more often than you deem reasonable, there might be a hardware problem on the network.

Here are some things you can do to determine if there is a hardware problem in the network:

  1. Start Monitor and look at the Protocol Statistics. These statistics are accumulated since the last time TCP/IP was started on the machine (typically during boot). Some of these statistics cover "timeout" errors (for example, "dropped due to keepalive timeouts").

    In general, the ratio of timeout errors to TCP connections should be no more than 1 to 2 percent. Anything more than 10 percent indicates a problem, and ratios between 2 and 10 indicate a possible problem. (These ratios are rules-of-thumb; it is up to you to determine the ratios that indicate what is, and is not, a significant problem.)

    An excessive number of connections dropped due to timeouts might indicate a faulty bridge on the network.

  2. Ping the host that is timing out. To make the Ping useful, you must make it emulate the type of connections that are being dropped by the host.

    For example, if users typically Telnet to the host, set up Ping to resemble Telnet. Start Ping and click the Ping button. For data length, use 1000. For number of packets, pick a large number that will keep the Ping going long enough to resemble a normal Telnet session, or the Telnet sessions that are getting dropped. As an alternative, pick 0 to have the Ping continue until you stop it.

    If you are emulating FTP connections, use a larger data length (like 1500 for Ethernet or 4352 for FDDI). Send enough packets so that the Ping resembles the FTP sessions that are being dropped. The reason you want to send large packets is to determine if there is a router that is handling large packets incorrectly.

    Start Ping and look at the %Loss figure when Ping finishes. A high packet loss might indicate a hardware problem on the network, either in the line itself or in a bridge, router, or other machine. Use MultiNet Tools TraceRoute to help isolate which machines are used in the connection.

Finding Out Who is Responsible for a Problem Router

If you know which part of a network is failing, you also want to know who is responsible for fixing that part. If the machine's owner has registered with a "white pages" server to which you have access, you can use MultiNet Tools Whois to find out who to notify about the failing machine. You can use Whois to look up contact information on full machine names, domain names, and IP addresses.

White pages only contain information about Internet hosts, not hosts internal to your organization's network.



[table of contents][main topic of chapter][index][previous][next]
[copyright and trademarks][Preface Overview][Cisco TCP/IP Suite books]

Copyright© 1995-1996 Cisco Systems, Inc. All Rights Reserved.

HTML file generated May 15, 1996.