How to short-circuit a network…

Most modern networks use a star topology: each computer plugs into a separate port on a switch, either directly or via a patch panel, and larger networks will have multiple switches connected together. However, what happens if you plug both ends of a patch cable into the same switch? I’ve encountered this situation a couple of times.

Back in 2006, I investigated a problem where some desktop computers in a particular room didn’t have internet access. It turned out that none of them had a network cable plugged in. However, there was a big rats’ nest of network cables plugged into the switch under the desk, so I wasn’t sure where they were all connected to; on closer inspection, it turned out that each cable had both ends plugged into the switch. Well, that ain’t gonna work! One swift tidying session later, and they were up and running again. Still, this loop didn’t cause any problems for the rest of the network. I think it’s significant that it was an unmanaged switch; I believe it was from 3Com’s SuperStack range.

More recently, I was called in to look at a network that had come to a complete halt: none of the client computers could access any of the servers. Again, someone had plugged both ends of a cable into a switch, but this time it was a “smarter” (managed) model: the Cisco Catalyst Express 500. More specifically, each PC was connected to a CE500 access switch, then each CE500 had a gigabit uplink port connecting it to a Catalyst 2970 core switch. These switches sometimes need to search the rest of the network when they’re looking for a new device, and I think the algorithm goes something like this:

If you get a request from a “normal” port, try each of the other normal ports on this switch, in case the device is there. If it’s not, try the uplink port to query the rest of the network.
If you get a request through the uplink port, try each of the normal ports, then stop.

In this case, when another CE500 switches looked for a new device, it would send the request to the core switch, then the core switch would echo it outwards to all the other CE500 switches. When that request hit the dodgy switch, it would try each of the ports, including one end of the looped cable. The request would then come back into the switch through the other end of that cable, and the dodgy CE500 would send it out to the core switch via the uplink port. The core switch would then echo it to every other CE500, including the one that started the process.

At this point, the original (good) CE500 switch detected an STP (Spanning Tree Protocol) loopback error coming from the uplink port, and blocked that port so that it wouldn’t keep sending data in circles. Unfortunately, that meant that the computers connected to that CE500 switch could only talk to each other, i.e. none of them could talk to the servers (which were plugged into the core switch). This happened for every other CE500 switch in a very short space of time (a few seconds?), paralysing the network.

This behaviour would be more useful if it had happened on the dodgy CE500 (i.e. recognising that two ports were connected to each other) or if it had happened on the core switch (blocking the dodgy CE500 and letting the rest carry on as normal). Aside from anything else, that would have made it easier to identify which switch actually had the problem! As it was, I diagnosed the problem by unplugging all the CE500 switches from the core, then plugging them back in one at a time until the problem reoccurred.

The Catalyst 2970 is a more advanced switch, so I’m surprised that it didn’t notice the problem; this may be a configuration issue. A friend tried to duplicate this on a Catalyst 2940 (on a separate network) and that basically ignored the loop, i.e. there was no effect on the rest of the network. So, maybe it’s the combination of the Catalyst and Catalyst Express which caused issues.

How to short-circuit a network…

Leave a comment

Cancel reply