Advanced File Systems and Logical Volume Management

On occasion I have written about advanced file systems and some of the benefits technologies such as Btrfs and ZFS provide. One form of advanced and flexible storage technology I tend to skip over is Logical Volume Management (LVM), which is commonly used in Linux distributions. What is LVM, why do people use it and how does it work? These are questions I have received recently and I would like to tackle all of those questions here, together. 

First, before we talk about the advantages of a technology like LVM, it is important to understand the limitations of standard file systems so we can appreciate what LVM improves. With traditional file systems we divide a hard disk into partitions. Each of these partitions is then assigned a mount point. This means we may have one partition for our root file system, another for our home directory and maybe a third for the /var directory. The one-to-one arrangement of one partition to one file system branch makes it fairly easy to visualize how standard file systems work. Where traditional file systems are limited is in their flexibility. Imagine we have a 100GB hard drive and we divide it into three parts, assigning 10GB for our root partition, 10GB for /var and the remaining 80GB is used for our /home file system. That seems fine for now, but what if we find out later that 10GB is not big enough for our /var file system? We could shrink our /home partition (if it is not full) and expand /var, but that requires taking our machine off-line. We could buy a new hard disk and make a /var partition there and then erase the existing /var, but again that requires taking the system off-line and resizing operations are awkward and time consuming. Basically, the big problems with traditional file systems are they are not fluid, resizing them is awkward and they have a strict one-to-one relationship with the underlying partitions. To get around these limitations we can use LVM. 

The hardest part about learning to use LVM is the jargon involved. With LVM there are three terms which get thrown around a lot and it is important to understand them. The first term is physical volume. A physical volume is another way of saying a hard drive or a partition. The second term is volume group. A volume group is simply a collection of physical volumes. Let's say we have three hard drives (A, B and C), if we link drives A and B together we can consider them a volume group. Another hard drive, such a C, could be made into a separate volume group consisting of a single physical volume. The third term is logical volume. A logical volume is basically a file system which exists inside a volume group. If this is difficult to visualize I find it helps to think about cookies. 

Traditional file systems are like baking cookies. We scoop out some raw dough onto a pan. Each cookie is physically separate from all other cookies. Once we put the pan in the oven the cookies harden and come out of the oven as fixed-sized individual snacks. A cookie and a traditional file system are both of a fixed size, separate from all other cookies or partitions. They cannot be merged once made and resizing them is difficult. If you make eight cookies and ten friends come to visit you cannot simply make each cookie smaller, freeing up dough for the extra two guests. Likewise, if six people arrive you cannot dynamically erase two cookies and make the remaining six cookies bigger to satisfy your guests. Now, let's re-imagine cookie baking with LVM. With LVM what we do is take all of the cookie dough and spread it onto the pan as one big block. We put the block of dough in the oven and, when it comes out, we have a solid sheet, a giant cookie that we can then carve into as many pieces of any size we wish. It doesn't matter how many people show up now, because we can dynamically carve the block of cookie so each person gets a fair share. LVM lets us group all of our storage devices (cookie dough) into one big block so that we can carve up the block into separate, dynamic file systems. 

By now you are probably enlightened (or hungry) and ready for an example. For the purposes of this tutorial I am going to say I have two hard drives (sda and sdb). I will also assume we have our distribution's LVM packages installed. First I am going to create a LVM-compatible partition on sda. This partition will be called sda1. To do this I launch cfdisk or another partition manager and create a partition which takes up the entire drive. I set the partition type to be Linux-LVM, which is numerically identified by the code 8E. 

Our next step is to mark our device, sda1, as being a physical volume which can be used by LVM.
pvcreate /dev/sda1
Now we have a physical volume and we want to use it to create a volume group. We can create a volume group called datapool using the following command:
vgcreate datapool /dev/sda1
Now that we have a volume group consisting of one partition, sda1, we can divide the group into separate file systems or logical volumes. Here we create a logical volume called myhome and make it 50GB in size.
lvcreate -n myhome -L 50g datapool
Now we have a virtual partition, or logical volume, called myhome. The next thing we need to do is format it with a file system. In this example we use the ext3 file system to format myhome. Remember, the logical volume myhome exists within the volume group datapool.
mkfs.ext3 /dev/datapool/myhome
Finally, we get to mount the logical volume and start making use of it. Here we create a new mount point, called Data, and attach our new logical volume to the Data directory.
mkdir Data
mount /dev/datapool/myhome Data
Were we to run the df command right now we should see a 50GB file system mounted under the Data directory. This is great, but earlier we talked about resizing and how dynamic LVM can be. What if we want to make the logical volume myhome larger? We can do that by extending the logical volume and then resizing its file system. Here we grow the myhome volume by 100GB.
lvextend -L +100g datapool/myhome
resize2fs /dev/datapool/myhome
We do not even need to take the file system off-line or reboot or anything of that nature. Simply running these commands expands the logical volume and the file system on it. We now have a 150GB storage pool under the Data directory.

At the moment we just have one device, sda1, in our volume group. What if we run out of space and want to add a new hard drive to our storage pool? In that case the steps are similar to creating the volume group in the first place. We create a partition on our second disk, sdb, and make it of type Linux-LVM. We then mark the new device as a physical volume.
pvcreate /dev/sdb1
Next we add the new device to our volume group.
vgextend datapool /dev/sdb1
This gives us a whole new device in our volume group which we can then assign to a logical volume. We can either create a new logical volume and assign it its own mount point or we could add the new storage to our existing myhome logical volume using the lvextend command. If at any point we would like to see a list of physical volumes, volume groups or our logical volumes we can run special list commands to display the existing groups and their sizes. The commands pvs, vgs, and lvs list the existing physical volumes, volume groups and logical volumes, respectively. 

A word of warning about using LVM: It is a powerful and flexible technology which can be very useful in situations where data storage requires change. This makes LVM especially useful on servers where data can grow quickly and, sometimes, in unpredictable ways. However, there is a potential problem with using LVM and that is if one physical storage device fails we can lose all of the data stored in the volume group. For instance, if I have drives A, B and C in a volume group and drive C fails, I may have just lost all of my data stored in the entire volume group. For this reason it is very important to make regular backups of data stored on a volume group as files may be stretched across any or all devices inside the group.