RAIDers of the lost sleep

Early in my career, I noticed an error when I rebooted a server, saying that one of the RAID drives had failed. The server was able to keep running, but the drive needed to be replaced, so one of my colleagues came over with a new one. The drive was hot-swappable, so he was quite cheerful about the fact that we wouldn’t need to shut the server down first. However, we disagreed about which drive had failed; the error message referred to drive 2, and there were 5 in total, but I thought that the numbering would start at 0 while he thought that it would start at 1. He outranked me, so he pulled out the second drive. Unfortunately, this turned out to be the wrong one (i.e. one of the working drives), so the entire server crashed, and we had to spend our Friday night re-installing Windows from scratch.

Continue reading “RAIDers of the lost sleep”