Btrfs production experiences

April 29, 2013
hpc btrfs filesystems sysadmin zfs work

My experience is based on SUSE SLES 11 SP1/SP2 with a stock kernel, so YMMV if you’re running a newer mainline kernel without all the backports.

I tested on two Supermicro systems. One with an LSI HBA card with 22x 2TB enterprise SATA drives (originally purchased to run OpenSolaris/ZFS). Second has an Adaptec hardware RAID controller with 36x 2TB enterprise SATA drives. Some of the data loss and stability issues I experienced may be attributed to later discovering the “enterprise” drives used in the first system turned out to be less RAID-friendly than the manufacturer claimed, eventually leading to them to replace ALL of my drives with a different model.

Btrfs was a preview in SLES SP1 and is “supported” in SP2 but with major restrictions if you wanted a supported configuration. Support in SP2 requires that you create btrfs filesystems using Yast and live with the limited options it allows. I’m guessing what you can do via Yast is the subset of features they tested enough to be willing to try and support. I tried using Yast to set up btrfs on one of our systems, but found their constraints too limiting given my use case and the organization I’d settled on in the SP1 days.

It was pretty stable in SP1 for my needs (simple backup server that pulled daily changes from production using rsync and then grabbed a snapshot), but management utilities needed work. SP2 was pretty rocky for several months after it was released. My memory is a little foggy but I think I had kernel panics at times in the btrfs code, but that could be a mix of SuSE backport issues, hardware issues, etc.

Storing bits and snapshots seem to work as advertised. SLES has included an fsck for quite some time (e.g., I think it was there in SP1). It seemed to do something, but I can’t comment on the odds of it being helpful vs. harmful. It did report things it was fixing when I’d run it after a system crash, and if memory serves it took a filesystem I couldn’t mount and repaired it sufficiently that I could mount it, but again YMMV since the lack of fsck has been a bone of contention for many. I’ve also verified that online resize (both grow and shrink) appeared to work well on my production systems, at least when used in conjunction with LVM. I tried the tool to convert in place from an existing ext filesystem on a small test VM, but haven’t tried it on a production system.

The one key feature I was interested in and was disatisfied with was the integrated software RAID. I tried out the RAID0/1 support on the system with the HBA, but found it to be seriously lacking. I’ve run a lot of systems with Linux mdraid and ZFS raid, and so I was surprised the btrfs folks didn’t provide a good way to view info about the drives in the pool, errors/status per drive, etc. The tools for working with the RAID features were just generally not well flushed out. I’m sure it will come in time, but I quickly gave up on trying to use that feature and reverted to the classic layered solution with mdraid for redundancy, LVM for volume management, and btrfs on top.

Another thing I missed from ZFS is getting disk usage info for a single snapshot (e.g., the space consumed by the deltas between neighboring snapshots). There are certainly times when one of my users does something wonky and creates hundreds of spurious GBs only to realize it a few days later and cleans it up. Proper btrfs send/receive to remotely replicate snapshot data would also be useful, but for the moment I have scripts that can do it manually by iteratively rsyncing over the specified range of local snapshots and capturing snapshots on the remote system.

I’m hopeful btrfs will at least alert me to any data corruption, but I’m not yet confident it will be helpful in identifying where the problem is, or that any aforementioned alert wouldn’t be buried amongst other spurious noise in syslog. Mdraid at least can be configured to monitor and send an email alert when there’s trouble. In that sense, ZFS isn’t necessarily much better since I’ve always written my own monitoring scripts that alert me to pool problems since there didn’t seem to be a monitoring daemon available out if the box.

I missed the better tools and features you got with ZFS, so I eventually switched the system with the HBA to FreeBSD/ZFS. Now that ZFS on Linux has made it further along, I’ve switched it over to Scientific Linux (ZoL is available in their addon repo). Of note, I was able to migrate all ZFS data in place by doing a simple ‘zfs export’ before rebooting into the SL installer, and then a ‘zfs import’ once I’d loaded all the necessary packages into SL. I’m hedging my bets by leaving the server with the hardware RAID running SLES with btrfs in the hopes I won’t be hit by software bugs on disparate platforms at the same time.