May 10, 2011

Whose Fault is it When Your Internet Dies? Troubleshooting Networks with Linux

When you lose your Internet connection is the fault of your ISP? Or is it a problem on your side of your connection? Here is how you find out using standard Linux networking tools, and avoid embarrassing tech support calls that conclude with "Yes, dear customer, you broke it your own self."

ISPs: Why We Hates Them

I have old, slow 640/128k ADSL. If I lived on the east side of my telco instead of the west side I would have high-speed fiber. That's life, and it's better than dialup.

But my ISP requires authentication via Web browser. This kicks in anytime I do anything that requires my router to request a new DHCP lease. Of course there is no Web browser on my router, so I have to do this on one of the PCs on my LAN. Often it times out and doesn't authenticate, or I am testing a number of different things and have to restart it a lot, and then it becomes a major nuisance. This. Is. Stupid. They control the wires, so what's the point? I can't access my DSL from a different location. I'm mentioning this not only because it feels good to rant, but it may be something you have to deal with too.

Like a Good Scout, Be Prepared

When you can't access the Internet you can't install software (unless you have your own local repository), so you should have these commands available on your computers:

  • ping
  • ifconfig
  • dig
  • GNU screen

You could also keep a bootable rescue Linux handy, such as my favorite SystemRescue CD/USB. SystemRescue has everything you'll ever need for system and network administration. A cool alternative is to have a netbook or laptop outfitted as a super-duper network administration machine, with two Ethernet interfaces, Wi-Fi, and all the networking software you can find. Have your Internet account login, password, and configuration recorded somewhere handy.

You should also keep a list of ping-able IP addresses, starting with your ISP and whatever name servers you are using. Then add a few more random sites so you have a decent-sized testing pool. Using and gives you a good selection. Some server admins block ping (ICMP echo request), which is foolish, so that is why you need a batch of proven ping-able addresses for testing.

It is good to have a couple of extra Ethernet cables, because it's fast and easy to swap cables, though investing in a fancy cable tester is fun. Keep a spare hub or simple, not-fancy switch around for quick testing as well. If you have your own router have whatever interface cable you need to communicate directly with it, whether it's serial or USB. Don't rely on Ethernet because misconfigurations will lock you out.

Be Kind

Be kind and don't bombard sites with pings and other queries. ping will run forever if you don't stop it, so limit it with the -c option; for example ping -c4 [host] runs four times and then stops.

Finding IP Addresses

How do you find the IP addresses of all these sites? Easy peasey with good old ping:

$ ping -c4
PING ( 56(84) bytes of data.
64 bytes from ( icmp_req=1 ttl=51 time=125 ms

I like because it uses a big pool of addresses — ping it again and you'll get different results. You can see all of them using the dig command:


By default dig looks for A records, so this is the same as dig A.

Single PC

The simplest scenario is a single computer connected directly to an Internet gateway: DSL, cable, fiber, wi-fi, or even dialup. (Yes, my friends, about half of Internet users in the US are still on dialup.) My favorite troubleshooting protocol is Close to Far: start with your PC, then move on up the chain from there. Let's run through some basic checks.

First ping localhost:

$ ping localhost
PING localhost ( 56(84) bytes of data.
64 bytes from localhost ( icmp_req=1 ttl=64 time=0.044 ms

If you don't see this, but instead get a ping: unknown host localhost message, try the loopback address:

$ ping

If that fails then there is a problem with your network interface card; it's dead or not plugged in correctly. If it passes, then run ifconfig to see if you have a routable address assigned, as this snippet shows:

$ /sbin/ifconfig
   inet addr:  Bcast:  Mask:

That's a proper (fake) routable IP address, so that looks all right. If you see an address in the range then your interface has not been assigned a routable address, and has been assigned an IPv4 link-local address. (See IP address for an IPv4 refresher.) So it's not getting a DHCP lease, or your static configuration is wrong.

If your address is good then ping the next link — if you have a DSL/cable/etc. modem with its own IP address, ping that and watch its status LEDs, if it has any, for network activity. If it fails, check power and network connections, and power-cycle it.

If that succeeds, or you have a "dumb" modem with no IP address, ping your ISP. First ping by its hostname, like ping If that fails with an "unknown host" error, then ping their IP address. If that succeeds then it is a DNS problem. You can have connectivity without DNS, which means you can Web-surf only via IP addresses.

If you are running a local caching resolver, then you could try flushing the cache to see if that changes anything. Most distros do not enable one by default, so if you have one it's because you put it there. Use the wonderful and powerful dig command to see what your resolver, whether it's your own or your ISP's, is pointing to, like this:

$ dig @
; <<>> DiG 9.7.3 <<>> @
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40503
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 2

;			IN	A




;; Query time: 103 msec
;; WHEN: Fri May  6 12:26:35 2011
;; MSG SIZE  rcvd: 131

Do this using different DNS servers such as Google Public DNS, or OpenDNS, or anyone you want, and compare the results. If this works, or if your ISP resolver returns different results, then that's a pretty good sign that your ISP's DNS servers have a problem. Before you pick up the phone to politely request that they fix it, do one more thing: request a new DHCP lease. Assuming you have a dynamic account, of course. You can reboot, or you can figure out which DHCP client your distro uses and manually renew your lease like this, naming your correct Ethernet interface:

# dhclient eth0

Other Linux DHCP clients are dhcpcd and pump. If this doesn't change anything then chances are good that your ISP is at fault.

If you have an account that gives you a static IP address, then check your configurations, and check to see if your service provider changed something without telling you. For example, my ISP changed name servers without notifying customers. That was fun.

You Have a Router

Suppose you have a router between your PC and your Internet gateway. Before doing anything to your router, make sure you can talk directly to it via serial or USB cable. With a serial connection use GNU screen because it's a lot easier than Minicom. All you need to know is the correct line speed, and then run it like this:

$ screen /dev/ttys0 38400

This is your Plan B if your router configuration gets messed up.

After verifying that your PC is not the source of the problem, ping your router. If that succeeds, then log into your router and run the same ping tests as above — ping your ISP and other sites by hostname and IP address, and try using dig with different resolvers if necessary. If this doesn't work, check your router configurations, and check if your ISP changed anything without telling you.

If the configurations look all right, then connect a laptop or PC directly to your DSL/cable/etc. modem. Configure it for whatever type of account you have (static or dynamic) and see what happens. If you connect successfully then the problem is either in your router or switch.

Click Here!