Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Preventing data centre downtime

Nurdianah Md Nur | Nov. 11, 2014
Experts recommend regularly monitoring and testing UPS batteries as well as implementing DCIM systems.

Following the power outage that froze trading activities on the Singapore Exchange (SGX) last Wednesday (5 November 2014), data centre experts provided advice on preventing recurrence of such incidents.

"Downtime can be disruptive and potentially disastrous for any financial activity," said Matthew Kong, Emerson Network Power's country manager for Singapore. He highlighted that his company's study last year found that every minute of data centre downtime in the US costs as much as US$7,900 per minute. "While the dollar amount may be lower in Asia, the impact on financial institutions and exchanges is no less great in terms of reliability," he added.

According to SGX's preliminary investigations announced on 6 November 2014, the outage was due to a malfunction in the uninterruptible power supply (UPS). The UPS, which is a backup power supply, should have kicked in when the primary power supply went down, but failed to do so last Wednesday afternoon.

Keith Murray, vice president for IT in Singapore and Brunei at Schneider Electric, advised that UPS batteries is one area to look at in order to maintain the UPS system runtime. "UPS batteries have finite life spans, so they should be replaced every three to four years, depending on various infrastructure conditions such as temperature," he advised. He added that the batteries should be regularly tested, and highlighted the need for battery monitoring.

Another way of preventing the recurrence of such an outage is to implement data centre infrastructure management (DCIM) systems, recommended Kong. Since DCIM provides real-time monitoring and feedback on power, space, cooling and other critical data centre components, it enables data centres to optimise availability and uptime.  This thus enables financial institutions to "remain resilient and reliable, as well as to be constantly running," said Kong.

In response to the recent downtime, SGX has launched a board-level inquiry committee and engaged independent experts to investigate the matter.  

 

Sign up for Computerworld eNewsletters.