Stratos Gerakakis

RAID LVM2 setup

Tagged as: linux  raid  lvm 

Oct 20 2010

I’ve been using a software created RAID in Ubuntu for a couple of years now, and I am very happy with it. The system has gone through many iterations of system updates (and even some complete OS reinstalls) and it keeps working fine.

Recently, I was not happy with the way I had partitioned the array and I wanted to re-arrange it, so I thought it would be a nice idea to document the process I went through. At the same I took the opportunity to also document the initial array creation (in case I’ll have to redo it again)

RAID Setup

Creating the RAID sandbox

Caution: Do not play recklessly with your disk partitions. You can very easily mistype something, or run a command with the wrong parameters and lose your data. Instead of blindingly follow what I propose here, create a safe sandbox environment to familiarize yourself with the RAID setup. Then when you are comfortable setting everything up, do the same thing with your real disks.

The sandbox will consist of some loopback files that will simulate disk partitions. We will use these “fake” partitions to create our array. That way there will be no harm on your actual disks.

Let’s imitate the creation of the RAID5 partition. First we will create three loopback files of 100MB each:

stratos@yoda:/tmp$ cd ~

stratos@yoda:~$ cd sandbox/

stratos@yoda:~/sandbox$ dd if=/dev/zero of=raid1 bs=10240 count=10240
10240+0 records in
10240+0 records out
104857600 bytes (105 MB) copied, 0.315455 s, 332 MB/s

stratos@yoda:~/sandbox$ cp raid1 raid2

stratos@yoda:~/sandbox$ cp raid1 raid3

stratos@yoda:~/sandbox$ ls -al
total 307216
drwxr-xr-x  2 stratos stratos      4096 2010-10-01 23:56 .
drwxr-xr-x 11 stratos stratos      4096 2010-10-01 23:55 ..
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid1
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid2
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid3

With the loopback files created we can mount them as partitions:

stratos@yoda:~/sandbox$ sudo losetup /dev/loop1 raid1
stratos@yoda:~/sandbox$ sudo losetup /dev/loop2 raid2
stratos@yoda:~/sandbox$ sudo losetup /dev/loop3 raid3

Setup of RAID arrays

Now we have created three partitions /dev/loop1, /dev/loop2 and /dev/loop3 that will imitate some actual /dev/sdXX, /dev/sdXX and /dev/sdXX partitions.

Now, let’s combined them to create our RAID5 array:

stratos@yoda:~/sandbox$ sudo mdadm --create --verbose /dev/md3 -l5 -n3 /dev/loop1 /dev/loop2 /dev/loop3
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: size set to 102336K
mdadm: array /dev/md3 started.

LVM2 Setup

At this point we could format the /dev/md3 partition with some filesystem and start using it. But that’s the old way of doing it. What we want to do is run LVM2 on top of the RAID5 partition so we can better utilize the space provided by the /dev/md3 partition, allowing us to dynamically create, resize and remove volumes in the space provided.

Create the LVM2 partitions

In order to use LVM2 we first have to create a LVM2 Physical Volume:

stratos@yoda:~/sandbox$ sudo pvcreate /dev/md3
  Physical volume "/dev/md3" successfully created

stratos@yoda:~/sandbox$ sudo pvdisplay
  "/dev/md3" is a new physical volume of "199.88 MB"
  --- NEW Physical volume ---
  PV Name               /dev/md3
  VG Name
  PV Size               199.88 MB
  Allocatable           NO
  PE Size (KByte)       0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               a00wd1-tPYt-zeuf-dybg-WbOX-3KDb-JYWlU6

So, we see here that the three 100MB loopback files have been combined to a 200MB RAID5 LVM2 Physical Volume (3 disks - 1 parity = 2 disks * 100 MB = 200MB usable space)

Next we will create a LVM2 Volume Group:

stratos@yoda:~/sandbox$ sudo vgcreate lvm-group /dev/md3
  Volume group "lvm-group" successfully created

stratos@yoda:~/sandbox$ sudo vgdisplay
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               196.00 MB
  PE Size               4.00 MB
  Total PE              49
  Alloc PE / Size       0 / 0
  Free  PE / Size       49 / 196.00 MB
  VG UUID               D6c2DE-YS5R-MM8M-rle5-J117-uF11-KORUYB

From the vgdisplay command we should note the Free PE size value. PE stands for Physical Extends and it the minimum amount of storage unit that we can use to create our final LVM2 volumes. Basically we have 49 PE to allocate as we wish for LVM2 volumes. Each PE is 196MB / 49 = 4MB of size.

Let’s create a LVM2 Logical Volume with 20 PE (20 * 4MBs = 80MB):

stratos@yoda:~/sandbox$ sudo lvcreate -l 20 lvm-group -n myVolume
  Logical volume "myVolume" created

stratos@yoda:~/sandbox$ sudo lvdisplay
  --- Logical volume ---
  LV Name                /dev/lvm-group/myVolume
  VG Name                lvm-group
  LV UUID                qtWGgg-NIgm-l3qz-YYLP-tCG2-FApD-oVd8il
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                80.00 MB
  Current LE             20
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:5

Finally, the LVM2 volume has been created. Let’s see what’s left from the lvm-group Volume Group:

stratos@yoda:~/sandbox$ sudo vgdisplay
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               196.00 MB
  PE Size               4.00 MB
  Total PE              49
  Alloc PE / Size       20 / 80.00 MB
  Free  PE / Size       29 / 116.00 MB
  VG UUID               D6c2DE-YS5R-MM8M-rle5-J117-uF11-KORUYB

OK. The allocated PE are 20 and we have another 29 unallocated.

Create the mountable volume

The newly created LVM2 volume is unformatted:

stratos@yoda:~/sandbox$ sudo mkfs.ext4 /dev/lvm-group/myVolume
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
20480 inodes, 81920 blocks
4096 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
10 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Now let’s mount it:

stratos@yoda:~/sandbox$ mkdir myVolume

stratos@yoda:~/sandbox$ sudo mount /dev/lvm-group/myVolume myVolume

stratos@yoda:~/sandbox$ ls -al
total 307217
drwxr-xr-x  3 stratos stratos      4096 2010-10-02 00:31 .
drwxr-xr-x 10 stratos stratos      4096 2010-10-02 00:06 ..
drwxr-xr-x  3 root    root         1024 2010-10-02 00:30 myVolume
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid1
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid2
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid3

stratos@yoda:~/sandbox$ cd myVolume/

stratos@yoda:~/sandbox/myVolume$ ls -al
total 17
drwxr-xr-x 3 root    root     1024 2010-10-02 00:30 .
drwxr-xr-x 3 stratos stratos  4096 2010-10-02 00:31 ..
drwx------ 2 root    root    12288 2010-10-02 00:30 lost+found

So, now we finally have a 80MB volume that is sitting atop a RAID5 partition.

Let’s put some files in the this volume so we can test if the tests we will run later on will make us lose our data:

stratos@yoda:~/sandbox/myVolume$ sudo sh -c "echo 'Hello LVM2 logical volume' > test.txt"

stratos@yoda:~/sandbox/myVolume$ cat test.txt
Hello LVM2 logical volume

As long as the test.txt file is there through all our following tests, then our data is safe.

LVM2 Manipulation

Let’s play with the 49 available PE given to us from the lvm-group Volume Group.

Add more volumes

Create another LVM2 Logical volume:

stratos@yoda:~/sandbox$ <b>sudo lvcreate -l 10 lvm-group -n myVolume2</b>
  Logical volume "myVolume2" created

stratos@yoda:~/sandbox$ sudo mkfs.ext4 /dev/lvm-group/myVolume2
mke2fs 1.41.9 (22-Aug-2009)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
10240 inodes, 40960 blocks
2048 blocks (5.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=41943040
5 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
        8193, 24577

Writing inode tables: done
Creating journal (4096 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 21 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

stratos@yoda:~/sandbox$ mkdir myVolume2

stratos@yoda:~/sandbox$ sudo mount /dev/lvm-group/myVolume2 myVolume2

stratos@yoda:~/sandbox$ ls -al
total 307218
drwxr-xr-x  4 stratos stratos      4096 2010-10-02 11:39 .
drwxr-xr-x 10 stratos stratos      4096 2010-10-02 00:06 ..
drwxr-xr-x  3 root    root         1024 2010-10-02 00:42 myVolume
drwxr-xr-x  3 root    root         1024 2010-10-02 11:39 myVolume2
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid1
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid2
-rw-r--r--  1 stratos stratos 104857600 2010-10-01 23:56 raid3

Now we have another volume with 10 PE (40MB).

Resize volumes

We have myVolume with 20 PE and myVolume2 with 10 PE. We should have 19 PE still unallocated:

stratos@yoda:~/sandbox$ sudo vgdisplay
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               196.00 MB
  PE Size               4.00 MB
  Total PE              49
  Alloc PE / Size       30 / 120.00 MB
  Free  PE / Size       19 / 76.00 MB
  VG UUID               D6c2DE-YS5R-MM8M-rle5-J117-uF11-KORUYB

Let’s increase the size of myVolume by 10 PE:

stratos@yoda:~/sandbox$ sudo lvextend -l+10 /dev/lvm-group/myVolume
  Extending logical volume myVolume to 120.00 MB
  Logical volume myVolume successfully resized

stratos@yoda:~/sandbox$ sudo umount myVolume

stratos@yoda:~/sandbox$ sudo e2fsck -f /dev/lvm-group/myVolume
e2fsck 1.41.9 (22-Aug-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/lvm-group/myVolume: 12/20480 files (8.3% non-contiguous), 8240/81920 blocks

stratos@yoda:~/sandbox$ sudo resize2fs /dev/lvm-group/myVolume
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/lvm-group/myVolume to 122880 (1k) blocks.
The filesystem on /dev/lvm-group/myVolume is now 122880 blocks long.

Let’s check if we still have our files:

stratos@yoda:~/sandbox$ sudo mount /dev/lvm-group/myVolume myVolume

stratos@yoda:~/sandbox$ cd myVolume

\stratos@yoda:~/sandbox/myVolume$ ls -al
total 18
drwxr-xr-x 3 root    root     1024 2010-10-02 00:42 .
drwxr-xr-x 4 stratos stratos  4096 2010-10-02 11:39 ..
drwx------ 2 root    root    12288 2010-10-02 00:30 lost+found
-rw-r--r-- 1 root    root       26 2010-10-02 00:42 test.txt

stratos@yoda:~/sandbox/myVolume$ cat test.txt
Hello LVM2 logical volume

So far so good.

Remove volumes

We have another volume, myVolume2 that we created for fun for 10 PE (40 MB). Let’s remove it and reclaim the space:

stratos@yoda:~/sandbox/myVolume$ cd ..

stratos@yoda:~/sandbox$ sudo umount myVolume2

stratos@yoda:~/sandbox$ sudo lvremove /dev/lvm-group/myVolume2
Do you really want to remove active logical volume "myVolume2"? [y/n]: y
  Logical volume "myVolume2" successfully removed

stratos@yoda:~/sandbox$ sudo vgdisplay
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               196.00 MB
  PE Size               4.00 MB
  Total PE              49
  Alloc PE / Size       30 / 120.00 MB
  Free  PE / Size       19 / 76.00 MB
  VG UUID               D6c2DE-YS5R-MM8M-rle5-J117-uF11-KORUYB

So, now we have 19 free PE. The allocated 30 are from myVolume (20 initial + 10 after the resize)

RAID Manipulation

Playing with LVM2 is nice because you can manipulate and assign the available space you have any way you want. The actual unseen hero is the RAID5 volume that we have created because it provides a guarantee that even if one disk fails we will not lose our data. Let’s play with the RAID disks now.

Fail a RAID disk and recover

Hopefully you will never have to face a failed disk, but it is quite important to know what to do if a disk fails. For now, in our comfortable sandbox we will imitate a failed disk and we will check if we’ll lose any data:

stratos@yoda:~/sandbox$ sudo mdadm --fail /dev/md3 /dev/loop3
mdadm: set /dev/loop3 faulty in /dev/md3

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sat Oct  2 23:48:23 2010
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.19

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       0        0        2      removed

       3       7        3        -      faulty spare   /dev/loop3

Notice that the state is clean, degraded meaning that the files are OK but the state of the array is, well, degraded.

Let’s check the status of our files:

stratos@yoda:~/sandbox$ cd myVolume

stratos@yoda:~/sandbox/myVolume$ ls -al
total 18
drwxr-xr-x 3 root    root     1024 2010-10-02 00:42 .
drwxr-xr-x 4 stratos stratos  4096 2010-10-02 11:39 ..
drwx------ 2 root    root    12288 2010-10-02 00:30 lost+found
-rw-r--r-- 1 root    root       26 2010-10-02 00:42 test.txt

stratos@yoda:~/sandbox/myVolume$ cat test.txt
Hello LVM2 logical volume

Yep, still here. Now, let’s remove the failed disk (as if you are opening the computer case and removing the failed disk):

stratos@yoda:~/sandbox/myVolume$ sudo mdadm --remove /dev/md3 /dev/loop3
mdadm: hot removed /dev/loop3

stratos@yoda:~/sandbox/myVolume$ sudo losetup -d /dev/loop3

stratos@yoda:~/sandbox/myVolume$ sudo rm raid3

stratos@yoda:~/sandbox/myVolume$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sat Oct  2 23:53:58 2010
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.22

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       0        0        2      removed

Ok, the disk is removed. Now let’s assume that you replace the failed hard drive, that you have recreated a similar partition as the one originally used and you are ready to fix the degraded RAID array:

stratos@yoda:~/sandbox/myVolume$ cd ..

stratos@yoda:~/sandbox$ dd if=/dev/zero of=raid3 bs=10240 count=10240
10240+0 records in
10240+0 records out
104857600 bytes (105 MB) copied, 0.430725 s, 243 MB/s

stratos@yoda:~/sandbox$ sudo losetup /dev/loop3 raid3

stratos@yoda:~/sandbox$ sudo mdadm --add /dev/md3 /dev/loop3
mdadm: added /dev/loop3

Now, immediately after adding the disk the array will start reconstructing itself. It should look something like:

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sat Oct  2 23:58:58 2010
          State : clean, degraded, recovering
 Active Devices : 2
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 32% complete

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.34

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       3       7        3        2      spare rebuilding   /dev/loop3

Notice how the state is clean, degraded, recovering and that the Rebuild Status process is at 32% percent. Depending on the size of your array the process might take from some minutes (I doubt it) up to some hours or days. You can always check the process with the above command. The array should be still usable and accesible even while it is getting reconstructed, but I really wouldn’t like to push my luck, so I wouldn’t suggest doing any work in the array until the reconstruction is done.

In our case the reconstruction, for a tiny 120MB array, takes merely seconds and if we check the state again we see:

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sat Oct  2 23:59:02 2010
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.48

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       7        3        2      active sync   /dev/loop3

Nice and clean. Let’s hope you never have to go through a procedure like this, EVER.

Increase the RAID array size

”“”“”“”“”“”“””

Up until now we were playing with LVM2 volumes and the allocated space that was assigned to the LVM2 Physical Volume. But, what should we do if we want to create a Logical Volume larger than the 49 PE that the Volume Group contains? Well, not much unless we increase the number of PE.

This can be done in two ways:

  • Increase the number of disks in the RAID5 array or
  • Increase the size of the disks already in the array.

Add new disks to the RAID array

Adding a new disk in the array is easy (assuming you still have free SATA ports in your motherboard).

Let’s run an example:

stratos@yoda:~/sandbox$ dd if=/dev/zero of=raid4 bs=10240 count=10240
15360+0 records in
15360+0 records out
157286400 bytes (157 MB) copied, 0.675796 s, 233 MB/s

stratos@yoda:~/sandbox$ sudo losetup /dev/loop4 loop4

stratos@yoda:~/sandbox$ sudo mdadm --add /dev/md3 /dev/loop4
mdadm: added /dev/loop4

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sun Oct  3 00:37:16 2010
          State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.49

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       7        3        2      active sync   /dev/loop3

       3       7        4        -      spare   /dev/loop4

So we added the newly created disk to the array and mdadm considers the new disk a “spare” disk. A spare disk is considered a disk that is in standby mode ready to substitute a failed disk (automatically, I think; needs investigation). The spare disks capacity does not participate in the total capacity of the array, as it is in standby mode. Let’s make it a part of the array:

stratos@yoda:~/sandbox$ sudo mdadm --grow /dev/md3 --raid-devices=4
mdadm: Need to backup 384K of critical section..
mdadm: ... critical section passed.

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.91
  Creation Time : Sat Oct  2 00:09:30 2010
     Raid Level : raid5
     Array Size : 204672 (199.91 MiB 209.58 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sun Oct  3 00:38:09 2010
          State : clean, recovering
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

 Reshape Status : 19% complete
  Delta Devices : 1, (3->4)

           UUID : 710c3b37:15563004:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.68

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       7        3        2      active sync   /dev/loop3
       3       7        4        3      active sync   /dev/loop4

Immediately after the new disk is added to the array, mdadm starts reconstructing the array. Here we see a 19% complete status and the upgrade notification of 3->4 disks. Again, depending on your disks sizes this might take some time, and probably longer than adding a failed disk to the original disk array size because all the data will have to be re-written and spread across the new disk.

The array should still be functional and up and running during the whole procedure. Once it’s done we have to allocate the extra space to the LVM2 Physical Volume and resize the lvm-group LVM2 Volume Group.

First let’s see how much space the Physical Volume has:

stratos@yoda:~/sandbox$ sudo pvdisplay /dev/md3
  --- Physical volume ---
  PV Name               /dev/md3
  VG Name               lvm-group
  PV Size               199.88 MB / not usable 3.88 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              49
  Free PE               19
  Allocated PE          30
  PV UUID               a00wd1-tPYt-zeuf-dybg-WbOX-3KDb-JYWlU6

So, although we have added a fourth disk in the RAID array the LVM2 Physical Volume is still at 200MB.

Let’s increase it:

stratos@yoda:~/sandbox$ sudo pvresize /dev/md3
  Physical volume "/dev/md3" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

stratos@yoda:~/sandbox$ sudo pvdisplay /dev/md3
  --- Physical volume ---
  PV Name               /dev/md3
  VG Name               lvm-group
  PV Size               299.62 MB / not usable 3.62 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              74
  Free PE               44
  Allocated PE          30
  PV UUID               a00wd1-tPYt-zeuf-dybg-WbOX-3KDb-JYWlU6

stratos@yoda:~/sandbox$ sudo vgdisplay lvm-group
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               296.00 MB
  PE Size               4.00 MB
  Total PE              74
  Alloc PE / Size       30 / 120.00 MB
  Free  PE / Size       44 / 176.00 MB
  VG UUID               D6c2DE-YS5R-MM8M-rle5-J117-uF11-KORUYB

As you see the Physical Volume now is 300MB. A nice side-effect is that the lvm-group Volume Group was also automatically resized, because when we initially created we used all the available space of the Physical Volume, so it kept using all the available PE. So now we have a total of 74 PE to play with. If you want to increase the size of your Logical Volumes you can now easily do it (see the ‘Resize Volumes’ above).

But do we still have our data?:

stratos@yoda:~/sandbox$ cd myVolume

stratos@yoda:~/sandbox/myVolume$ cat test.txt
Hello LVM2 logical volume

Yep, still here.

Increase the size of the RAID disks

In reality (as in my personal case) you will end up increasing the initial RAID size because you want to substitute the original disks with larger ones. The prices keep dropping and the amount of SATA ports in a motherboard are finite. So, sooner or later you will need to replace the disks, with larger ones.

Unfortunately, you cannot do the upgrade at your leisure. That is, you just can’t drop a 1TB disk together with a couple of 640GB disks and expect to create a RAID5 array larger than the original (3-1) x 640GB = 1280GB RAID5 array. You will have to upgrade all the disks at the same time, if you want to expand your RAID5 array. You can always, upgrade just one disk, though, and use the extra space of the larger disk as another partition.

So, for now, we have a 4 disk array with 100MB capacity each. Let’s substitute each disk with a 150MB one (one at a time):

stratos@yoda:~/sandbox$ sudo mdadm --fail /dev/md3 /dev/loop1
mdadm: set /dev/loop1 faulty in /dev/md3

stratos@yoda:~/sandbox$ sudo mdadm --remove /dev/md3 /dev/loop1
mdadm: hot removed /dev/loop1

stratos@yoda:~/sandbox$ sudo losetup -d /dev/loop1

stratos@yoda:~/sandbox$ rm raid1

stratos@yoda:~/sandbox$ dd if=/dev/zero of=raid1 bs=10240 count=15360
15360+0 records in
15360+0 records out
157286400 bytes (157 MB) copied, 0.498492 s, 316 MB/s

stratos@yoda:~/sandbox$ sudo losetup /dev/loop1 raid1

stratos@yoda:~/sandbox$ sudo mdadm /dev/md3 --add /dev/loop1
mdadm: added /dev/loop1

What we did was to fail, remove and substitute one disk with another, bigger one. As always the procedure is not instantaneous and the “adding” of the larger disk into the array should take some time.

When the array has finished restructuring, from the last add we did, lather-rinse-repeat for the remaining disks (loop2, loop3 and loop4 in our case). When you are done with all disks:

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sun Oct  3 15:16:01 2010
     Raid Level : raid5
     Array Size : 307008 (299.86 MiB 314.38 MB)
  Used Dev Size : 102336 (99.95 MiB 104.79 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sun Oct  3 15:18:28 2010
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ec2bf06b:20f11352:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.28

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       7        3        2      active sync   /dev/loop3
       3       7        4        3      active sync   /dev/loop4

The array is still 300MB (4 - 1) * 100MB = 300 MB! What happens is that during the introduction of the larger disks into the already existing array, mdadm is using a partition as large as the existing partitions that already are part of the array. So basically although we are adding a 150MB disk, the array is using only 100MB, just as the already existing older 100MB disks.

When we substitute all four disks, we can resize the array and use the remaining 50MB from each disk. Let’s do it:

stratos@yoda:~/sandbox$ sudo mdadm /dev/md3 --grow --size=max

stratos@yoda:~/sandbox$ sudo mdadm --detail /dev/md3
/dev/md3:
        Version : 00.90
  Creation Time : Sun Oct  3 15:16:01 2010
     Raid Level : raid5
     Array Size : 460608 (449.89 MiB 471.66 MB)
  Used Dev Size : 153536 (149.96 MiB 157.22 MB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sun Oct  3 15:59:13 2010
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ec2bf06b:20f11352:a1dba4fa:bb34958c (local to host yoda)
         Events : 0.137

    Number   Major   Minor   RaidDevice State
       0       7        1        0      active sync   /dev/loop1
       1       7        2        1      active sync   /dev/loop2
       2       7        3        2      active sync   /dev/loop3
       3       7        4        3      active sync   /dev/loop4

Finally done. The array is now 450MB (4-1) * 150MB = 450 MB. But how about the Volume Groups we had? Let’s check them:

stratos@yoda:~/sandbox$ sudo vgdisplay lvm-group
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               296.00 MB
  PE Size               4.00 MB
  Total PE              74
  Alloc PE / Size       30 / 120.00 MB
  Free  PE / Size       44 / 176.00 MB
  VG UUID               0F0ahV-9wdw-piAN-RQGt-sA4E-ttR3-c0ClSd

stratos@yoda:~/sandbox$ sudo pvresize /dev/md3
  Physical volume "/dev/md3" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

stratos@yoda:~/sandbox$ sudo vgdisplay lvm-group
  --- Volume group ---
  VG Name               lvm-group
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  3
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               448.00 MB
  PE Size               4.00 MB
  Total PE              112
  Alloc PE / Size       30 / 120.00 MB
  Free  PE / Size       82 / 328.00 MB
  VG UUID               0F0ahV-9wdw-piAN-RQGt-sA4E-ttR3-c0ClSd

As you see we went from 74 PE to 112 PE. Now we can go back to normal LVM2 volume manipulation to take advantage of the gained space.

Recover RAID array

OK, but what happens when something happens and you lose your RAID array?

Let’s just say that you whole motherboard just fails and you have to take your disks and set the machine again. Or, when you re-install the OS and you want to remount your RAID array.

The actual neccessary file, to identify and mount the RAID array lives in /etc/mdadm/mdadm.conf and it looks like this:

stratos@yoda:~/sandbox$ cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid0 num-devices=3 UUID=1ecc61f7:ab95b8b0:b6150be0:b4ec82ed
ARRAY /dev/md1 level=raid5 num-devices=3 UUID=0e065f5e:b0bc71b3:b6150be0:b4ec82ed

# This file was auto-generated on Wed, 22 Sep 2010 13:21:48 +0300
# by mkconf $Id$

Note the ARRAY definitions that tell mdadm what arrays you are currently using.

On a newly created system this file will be probably missing and you will have to recreate it.

If you already have an /etc/mdadm/mdadm.conf file in your system, but you are missing the ARRAY definitions you can force mdadm to create these definitions for you:

stratos@yoda:~/sandbox$ sudo mdadm --examine --scan

ARRAY /dev/md0 level=raid0 num-devices=3 UUID=1ecc61f7:ab95b8b0:b6150be0:b4ec82ed
ARRAY /dev/md1 level=raid5 num-devices=3 UUID=0e065f5e:b0bc71b3:b6150be0:b4ec82ed

All you have to do is append these lines at the end of the /etc/mdadm/mdadm.conf file.

If you don’t have an mdadm.conf file at all you can quickly produce one:

stratos@yoda:~/sandbox$ echo 'DEVICE partitions' > mdadm.conf
stratos@yoda:~/sandbox$ sudo mdadm --examine --scan >> mdadm.conf
stratos@yoda:~/sandbox$ sudo cp mdadm.conf /etc/mdadm/mdadm.conf

Now, if you reboot you should have you array back. Well, done!

RAID/LVM2 Summary

So to sum up all the Raid and LVM2 operations that we did:

Create sandbox

cd ~
cd sandbox/
dd if=/dev/zero of=raid1 bs=10240 count=10240
cp raid1 raid2
cp raid1 raid3
sudo losetup /dev/loop1 raid1
sudo losetup /dev/loop2 raid2
sudo losetup /dev/loop3 raid3

Create RAID5 array

sudo mdadm --create --verbose /dev/md3 -l5 -n3 /dev/loop1 /dev/loop2 /dev/loop3

Create LVM2 volumes

sudo pvcreate /dev/md3
sudo vgcreate lvm-group /dev/md3
sudo lvcreate -l 20 lvm-group -n myVolume     # you can use any PE you want

Format / Mount the volume

sudo mkfs.ext4 /dev/lvm-group/myVolume
mkdir myVolume
sudo mount /dev/lvm-group/myVolume myVolume

Extend an LVM2 Logical Volume

sudo lvextend -l+10 /dev/lvm-group/myVolume   # add any PE you want
sudo umount myVolume
sudo e2fsck -f /dev/lvm-group/myVolume
sudo resize2fs /dev/lvm-group/myVolume

Remove an LVM2 Logical Volume

sudo lvremove /dev/lvm-group/myVolume2

Replace a failed RAID disk

sudo mdadm --fail /dev/md3 /dev/loop3          # This is part of our simulation
sudo mdadm --remove /dev/md3 /dev/loop3
sudo losetup -d /dev/loop3
rm raid3
dd if=/dev/zero of=raid3 bs=10240 count=10240  # This creates a new fake disk
sudo losetup /dev/loop3 raid3
sudo mdadm /dev/md3 --add /dev/loop3           # Do this under normal circumstances

Add new disks to RAID5 array

dd if=/dev/zero of=raid4 bs=10240 count=10240  # This is part of our simulation
sudo losetup /dev/loop4 raid4
sudo mdadm --grow /dev/md3 --raid-devices=4    # Do this under normal circumstances
sudo pvresize /dev/md3

Increase size of RAID disks

sudo mdadm --fail /dev/md3 /dev/loop1          # First remove old disk
sudo mdadm --remove /dev/md3 /dev/loop1
sudo losetup -d /dev/loop1                     # This is part of our simulation
rm raid1
dd if=/dev/zero of=raid1 bs=10240 count=15360  # This creates a new fake disk
sudo losetup /dev/loop1 raid1
sudo mdadm /dev/md3 --add /dev/loop1           # Do this under normal circumstances
... Repeat for all disks...                    # Replace all disks
sudo mdadm /dev/md3 --grow --size=max
sudo pvresize /dev/md3

Recover lost RAID array

sudo mdadm --examine --scan                   # append output to /etc/mdadm/mdadm.conf

And that’s how it’s done.