November 05, 2015Networking for Systems Engineers
Downtime at MVD 01 Location.
Today we suffered a Network Outage due to a power grid failure on a 500 KVa line that brings power from the hydroelectric dams in the central part of the country.
This caused a power failure on Antel equipment that left the datacenter without connectivity for ~30 minutes on leading network equipment.
Regarding the equipment:
UPS: held the power outage entirely, dropping only ~8% of its capacity, having an actual load on the redundant grid of 17%.
Router: Since Sunday, we’ve been running on our main backup router. We were leaving some services offline. This occurred due to a Raid failure on the primary router that left dropped out of the cluster.
Resolution: There’s a planned maintenance tonight through tomorrow morning, re-configuring the network to avoid future issues.
Actual Status: UPS capacity has been restored to 100%, leaving us ready for another network outage.
No maintenance was required, and no intervention was needed.
Temperature Warning: During the power outage, the central HVAC unit went down for power saving until alarms went off 20 minutes after power loss.
HVAC was restored after the alarm went off, resuming regular operation.
Las Flores remains offline until further notice; a primary power failure remains, not affecting any services. I Will update once the situation normalizes.