Understanding RAID

Overview

Although RAID is a function for Web professionals that manage the hardware and servers it’s important that you know about RAID.

What is RAID?

RAID (redundant array of independent disks, originally redundant array of inexpensive disks is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called “RAID levels”, depending on what level of redundancy and performance (via parallel communication) is required. In October 1986, the IBM S/38 announced “checksum”. Checksum was an implementation of RAID-5. The implementation was in the operating system and was software only and had a minimum of 10% overhead. The S/38 “scatter loaded” all data for performance. The downside was the loss of any single disk required a total system restore for all disks. Under checksum, when a disk failed, the system halted and was then shutdown. Under maintenance, the bad disk was replaced and then a parity-bit disk recovery was run. The system was restarted using a recovery procedure similar to the one run after a power failure. While difficult, the recovery from a drive failure was much shorter and easier than without checksum.

RAID is an example of storage virtualization and was first defined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of disassociating a low-cost expectation from RAID technology.

RAID is now used as an umbrella term for computer data storage schemes that can divide and replicate data among multiple physical drives. The physical drives are said to be “in a RAID”, however the more common, incorrect parlance is to say that they are “in a RAID array”. The array can then be accessed by the operating system as one single drive. The different schemes or architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each scheme provides a different balance between three key goals: resiliency, performance, and capacity.

In a nutshell, RAID uses multiple hard drives as a single storage volume. Within this volume, certain techniques are used to create redundancy and, sometimes, increase read speed. Two such techniques are:

1. Striping: This increases the speed of access by splitting files across multiple hard drives and then simultaneously using both hard drives to read different pieces of the same file.

2. Mirroring: This increases redundancy by having the same data appear on multiple hard drives. Thus, if one fails, it’ll still be present on another, and the volume can be restored.

RAID 0 uses only striping (minimum 2 disks)
RAID 1 uses only mirroring (minimum 2 disks)
RAID 5 uses a mix of mirroring and striping (minimum 3 disks)

RAID 0 is fake RAID in that there is no redundancy. Indeed, because files are being striped across two disks, if either disk goes down, the data for the entire volume is lost. This significantly increases the failure rate of your storage and is thus for chumps.

Now say you’re using RAID 5 in a 3-disk array. If one of the disks goes down, then that’s not a problem. Just replace it, and the entire volume can be rebuilt from the data contained on the other two. No data is lost despite a hard drive having failed.

Learning Objectives

* What is RAID and how is it used
* What are the various levels of RAID
* How does RAID work?
* Why should Web designers and developers care about RAID?

Assignments

Read and watch the following and answer the questions above

Leave a Reply