By Ian Hight
Here I examine the differences between a professional data centre and a computer room.
I will show that resiliency, in both infrastructure and in skills and resources, is the defining factor.
What is resiliency and why is it important?
According to searchcio.techtarget.com, data centre resiliency is defined as “the ability of a server, network, storage system or an entire data centre to continue operating even when there has been an equipment failure, power outage or other disruption.”
The resiliency of IT systems and mission-critical applications underpins business resiliency, also known as business continuity. Enabling the organisation to continue to function as effectively as possible in the face of unplanned downtime (primarily because of disasters) is a major responsibility of the IT department. As we know in New Zealand, recent natural disasters (and good old human error) have shone the spotlight firmly on business continuity. It has raised the issue across all levels of business to the point where even SMBs are developing resiliency strategies.
Infrastructure resiliency underpins mission-critical applications
Key questions for any IT department are: how long can the business function without access to mission-critical applications (the Return Time to Operation (RTO) issue) and how much data can we afford to lose (the Return Point to Operation (RPO) issue)? The answers to these questions determine how resilient the environment needs to be.
So what is the advantage of a data centre over a computer room? A data centre, by definition, aims to reduce system recovery time to the minimum cost-effective duration. The systems that reside within the data centre or computer room can use disaster recovery systems located elsewhere along with high-availability software to meet the RTO and RPO goals that the business has set. The physical location and infrastructure, however, is what defines the term data centre, rather than computer room.
Most computer rooms include a single UPS (uninterruptable power supply), which also is responsible for power conditioning, some office-style air-conditioning sufficient for cooling and some level of security (typically a key lock on the entry door). That is about where it ends.
This style of computer room set up is prevalent in New Zealand as it is a low-cost option for many businesses and has proven to be largely adequate to date. What has changed is the need for systems to be available 24 x 7 and with that need comes the greater reliance on all of the systems to be maintained and operational all of the time.
While there are levels of data centres they generally share these minimum requirements:
- Racks of computer systems fed from two power distribution units (PDUs) that are each connected to two separate UPS systems running off separate banks of batteries
- UPS systems fed from mains power and from a generator with an automatic transfer switch (ATS); the ATS ensures that the moment that power is lost the generator starts, runs up to speed, balances the power levels and then starts feeding the UPS systems to automatically recharge the batteries and concurrently supply alternate power to the data centre
- A generator capable of powering the entire data centre, including racks of computers, telecommunication racks and air-conditioning
Another area of resilience needs to be the air-conditioning as, unlike in an office environment, the data centre air-conditioning runs non-stop. An n+1 requirement requires the data centre to run for an extended time with at least one air-conditioning unit out of action (whether that be for maintenance purposes or in the event of an issue occurring). After the n+1 requirements have been met the additional resilience is a matter of degrees.
Some of these basic requirements include:
- Overhead cabling to ensure any moisture is kept well away from power and data cabling
- Multiple independent distribution paths serving the IT infrastructure
- All IT infrastructure should be dual-powered and fully compatible with the topology of a site’s architecture
- Concurrently maintainable site infrastructure
- All cooling equipment is independently dual-powered, including chillers and heating, ventilating and air-conditioning (HVAC) systems
- Fault-tolerant site infrastructure with electrical power storage and distribution facilities
- Diversity at an exchange or in telecommunications
Security is also an issue worth resolving, including two factor access and CCTV.
The other area of difference is often found in the ability to test an environment in an appropriate and professional manner. No-one likes unnecessary risk and as such many lower level data centres do not fully load test on a planned basis. In theory everything should be fine but it is better to run a test at a more convenient time than find out there are issues when the power goes out.
In essence businesses with a computer room will bet that this lower cost but higher risk policy will see them through and that unplanned downtime is minimal. Such a business also accepts that periods of planned downtime (which is in fact 90%+ of all downtime), where applications are unavailable, are of acceptable duration to the business.
Such an organisation is also likely to be using back-up to tape, generally once a day, and sometimes entrusted to a staff member to take off-site at the end of their shift. This business accepts that it may lose up to a full day’s data (depending on when the next day the system fails). Because of the high element of risk taken, a computer room is a low resiliency environment.
Resiliency depends on people too
How resilient are other IT resources particularly the staff members? How do find qualified cover for people prevented from fulfilling their roles? What happens when the cover for the systems administrator, out of town on a week-long training course, themselves calls in sick?
The biggest resourcing issue for an in-house department is coverage; it is not enough to have one person for each job function as there are holidays to take, inevitable sick days and training courses to attend (as failing to keep up skills levels also lowers resiliency).
How are these absences covered? The demands on the in-house team are increasing as budgets are held flat or are, worse, cut, so having back-up resources available at a moment’s notice to step up is not an option. There is no ‘people-tap’ to turn on and off at will if no in-house resources are available.
So organisations turn to contractors and while this often provides some of the coverage needed, it is costly and organising this resource is time-consuming (and often frustrating) for IT management. Often, as with the infrastructure, there is excess capacity in one area, while the department scrambles to cover gaps in others.
As with the resilience issues for infrastructure, businesses are turning to outsourced services to buy what they need, in the exact quantity the need, when they need it. This I will discuss in future articles.