There is a particular type of frustration that comes with a community drawback you may’t reproduce. These are the problems that occur solely whenever you’re not trying.
Over time, I’ve seen my fair proportion of glitches, outages and misconfigurations. However there’s at all times that one situation that sticks with you as a result of it is so unusual and so sudden, it virtually seems like a prank. It is the type of drawback that does not present up in a textbook or a cert examination, and no quantity of finest practices may have ready you for it.
A True Story
It began with a ticket from a small department workplace: “Customers getting kicked off the VPN randomly.” This wasn’t too regarding at first, as we have all seen our fair proportion of flaky Wi-Fi issues or misconfigured DHCP leases. However then it received stranger. It wasn’t simply the VPN. Groups calls would freeze mid-meeting, information would not save to the shared drive, and typically the entire workplace would simply quietly drop off the community for a minute or two — after which come again like nothing occurred.
We checked the whole lot. The WAN circuit appeared clear. Latency and jitter? Minimal. Packet loss? Zero. The switches had been wholesome. The firewall logs did not present something bizarre. The positioning had a Meraki setup, so I may even examine heatmaps, occasion logs and shopper histories, and I nonetheless discovered nothing.
In the future of points could be a fluke. However this saved occurring. Not day-after-day, however usually sufficient that folks began asking, “Is our web haunted?” Ultimately, I did what each engineer dreads: I booked a web site go to.
As quickly as I walked into the server room, I knew one thing was off. First, it was sizzling — uncomfortably sizzling. Not full information middle meltdown ranges, however undoubtedly hotter than it ought to’ve been. I checked the room’s AC, and it was working, kind of.
Then I turned to take a look at the community gear, and I noticed it. There, plugged into the identical UPS because the core change and firewall, was a mini fridge. Sure, a mini fridge. Apparently, somebody within the workplace had determined the server room was a superb place to maintain their Crimson Bulls chilly. Each time the fridge’s compressor kicked on, it drew simply sufficient surge energy to momentarily starve the opposite gear on the circuit. It wasn’t sufficient to reboot something, however sufficient to trigger micro-brownouts that may drop connections or stall information flows, which was simply sufficient chaos to trigger “ghost” points.
It made sense in hindsight. The issue was sporadic as a result of the fridge wasn’t at all times biking. That defined why our logs by no means confirmed clear failures and why the problem was so arduous to pin down. The {hardware} by no means truly misplaced energy, it simply dipped into an unstable state for a number of seconds. Evidently, the fridge was evicted. We received a vendor to run a clear energy line for the rack, and identical to that, the “ghost points” vanished.
To this present day, when somebody complains a few community that “acts bizarre,” I at all times ask myself: What else is sharing that energy supply? As a result of typically, your greatest community enemy is not a misconfigured router, it is somebody attempting to maintain their lunch chilly subsequent to your firewall.