Where does the Metadata of Blocks in RAID is stored?

2

I am very excited to Know about Data centers, then I came across RAID systems.

My Questions may be very Silly, Please bear with me.

Generally in any RAID Level > 0, Ex: RAID 5 How does the Operation System Know a file chunks are located in which disk and which block?

What metadata will be stored for a given file?

If it really stores the metadata about files.. Where does this metadata physically stored (and on which disk it stores)?

As they say RAID 5 can support 1 Disk Failure, But if the disk that contains the metadata fails then everything is wasted right?

I wanted to know a Life Cycle of a File that is going to be Stored in RAID 5 and does updating works and how does deleting a File works?

Does each block contains information of a single file or multiple files?

Can I come to a Conclusion that as RAID level increases the Read Parallelism increases and Writes performance decreases?

Pawan Kumar

Posted 2018-01-31T15:36:31.003

Reputation: 121

1RAID operates on the block layer. It is not concerned with files. – Daniel B – 2018-01-31T16:08:00.013

Answers

2

What metadata will be stored for a given file?

RAID has absolutely nothing to do with file metadata. It is purely a disk arrangement and as such just does a disk address to physical block translation.

File metadata is purely a filesystem thing and a filesystem sits on top of a partition on a disk. There are several layers of translation between a file and a disk block. As far as a disk controller (the RAID controller) is concerned by the time it is given data pertaining to a "file" it has already been reduced by the filesystem and operating system logical disk drivers to "write this block of data at disk block number X".

RAID will arrange disks in a precise logical fashion. The controller knows details such as number of disks, stripe size and disk strip order and given those details any logical block address passed to it by the operating system can be calculated and mapped to "disk 2, location Y" or so on.

As they say RAID 5 can support 1 Disk Failure, But if the disk that contains the metadata fails then everything is wasted right?

No. RAID 5 contains 1 disk (worth of data) which is a logical sum of the other disks. You always have 1 complete copy of all the data plus metadata which equates to a combined sum of the other 3 disks.

Disk1 block + disk2 block + disk3 block = redundant (disk4) block

Should one drive fail you can rearrange the sum to give

Disk1 block + disk2 block + disk4 (redundant) block = disk3 block

So you get some level of fault tolerance spread across your disks. Should any one random disk fail you still have access to enough data to replace that disk. Hopefully you replace that disk before another one fails and you can rebuild its missing data to fully restore fault tolerance.

The mathematics behind the redundant block is worth an article of its own and I would recommend you read the Wikipedia page on RAID arrays to get a better idea of it.

I wanted to know a Life Cycle of a File that is going to be Stored in RAID 5 and does updating works and how does deleting a File works?

Does each block contains information of a single file or multiple files?

Files are a filesystem construct and are handled by your operating system. RAID is concerned entirely with disks and knows nothing about files.

A disk block could contain any number of files, it is entirely up to the operating system to put data in blocks on the disk.

The RAID controller simply presents a disk interface to the operating system, the operating system writes blocks to it. What is in those blocks is up to the operating system.

Can I come to a Conclusion that as RAID level increases the Read Parallelism increases and Writes performance decreases?

Yes, to an extent, it depends on the RAID level used and write performance doesn't necessarily decrease in line with read performance increasing. Write performance might be (say) 3/4 of the read performance depending on the task.

From Wikipedia

In comparison to RAID 4, RAID 5's distributed parity evens out the stress of a dedicated parity disk among all RAID members. Additionally, write performance is increased since all RAID members participate in the serving of write requests. Although it won't be as efficient as a striping (RAID 0) setup, because parity must still be written, this is no longer a bottleneck.

Since parity calculation is performed on the full stripe, small changes to the array experience write amplification: in the worst case when a single, logical sector is to be written, the original sector and the according parity sector need to be read, the original data is removed from the parity, the new data calculated into the parity and both the new data sector and the new parity sector are written.

Mokubai

Posted 2018-01-31T15:36:31.003

Reputation: 64 434