Hadoop on LVM2

I had built up a cluster from the remains of past projects and other odds and ends of hardware around the labs so inevitably the nodes were configured with mostly odd sized disks. Most of the time everything runs smoothly; however, the amount of stored data on the cluster has been growing quite a bit. We had plenty of capacity for it but several nodes with smaller disks would fill up while executing, get blacklisted by jobs, and become less useful to the cluster.

Amidst upgrading (i.e. adding additional disks), I decided to give LVM2 a shot.  One large expandable volume freed us from dealing with unique Hadoop config files. LVM2 (using defaults) will get you a single volume to use but the performance of multiple disks is lost.  Nevertheless, LVM2 can be configured to stride across disks (much like a RAID 0) but without the requirement that disks are of the same size.  I recommend at least trying it; it does make system management a whole lot easier.

These are my notes on how I setup the volume; feel free to adjust as you see fit.

  1. fdisk both physical drives with a partition type of 8e (Linux LVM).
  2. Create a new LVM volume group with the physical disk partitions
    # vgcreate hadoopdisks /dev/sda1 /dev/sdc1
    Volume group "hadoopdisks" successfully created
  3. The disks I used were one terabyte in size, but not all of that space is actually usable. This was a pretty easy way to get LVM to tell you the number of extents available.
    # lvcreate -L2T -i2 -nhadoop hadoopdisks /dev/sda1 /dev/sdc1
    Using default stripesize 64.00KiB
    Insufficient free extents (476932) in volume group hadoopdisks: 524288 required
    
    # lvcreate -l 476932 -i2 -nhadoop hadoopdisks /dev/sda1 /dev/sdc1
    Using default stripesize 64.00KiB
    Logical volume "hadoop" created
  4. One last change tells LVM2 it's free to allocate anywhere it can.
    # lvchange --alloc anywhere /dev/hadoopdisks/hadoop

Posted