I recently received a question about SGI’s pandora after someone found my run-pandora.sh script in my hpc-admin-scripts repo. They were looking for a way to test a server with a fair bit of memory in a short amount of time. They’d tried Memtest86 and found it to be incredibly slow when running single-threaded or proved too unstable when running on all cores. When they found my repo, they figured it’d be worth asking about pandora in the hopes it would be appropriate for their needs.
TL;DR Support for NUMA systems in torque/Moab breaks existing means of specifying shared memory jobs and limits scheduling flexibility in heterogeneous compute environments.
My experience is based on SUSE SLES 11 SP1/SP2 with a stock kernel, so YMMV if you’re running a newer mainline kernel without all the backports.
I tested on two Supermicro systems. One with an LSI HBA card with 22x 2TB enterprise SATA drives (originally purchased to run OpenSolaris/ZFS). Second has an Adaptec hardware RAID controller with 36x 2TB enterprise SATA drives. Some of the data loss and stability issues I experienced may be attributed to later discovering the “enterprise” drives used in the first system turned out to be less RAID-friendly than the manufacturer claimed, eventually leading to them to replace ALL of my drives with a different model.
Btrfs was a preview in SLES SP1 and is “supported” in SP2 but with major restrictions if you wanted a supported configuration. Support in SP2 requires that you create btrfs filesystems using Yast and live with the limited options it allows. I’m guessing what you can do via Yast is the subset of features they tested enough to be willing to try and support. I tried using Yast to set up btrfs on one of our systems, but found their constraints too limiting given my use case and the organization I’d settled on in the SP1 days.
So this upcoming week is the big annual supercomputing convention, SC10, down in New Orleans. Since I’m skipping out (anxiously waiting for the arrival of Little Miss Sunshine), I’ve got time to actually try and read through the slew of new product announcements and news coverage. So today I saw this quote on twitter from hpc_guru and just had to share:
“Cost of the building next generation of supercomputers is not the problem. The cost of running the machines is what concerns engineers.”
Well this is certainly not something I expected. SGI is one of the few HPC vendors out there that I’m aware of who are still doing neat things with hardware. We’ve got some of their large SMP Itanium boxes on the floor where I work, and I think they’re pretty slick machines. Pricy, but slick. And so far their support is about the best I’ve dealt with. That’s not saying their perfect (try getting a CXFS guru on the phone when you need one without sitting on a major outage for several hours), but they generally seem better than most of the other HPC vendors I’ve worked with (IBM, Cray).
So this story has been all over the place (at least if you read computing/HPC news). Steve Wozniak, co-founder of Apple Computer and designer of the historically significant Apple I and Apple II, took a job as chief scientist at a startup I was already aware of, Fusion-io. They’ve been designing PCIe boards loaded with flash memory to deliver super high performance storage to servers and maybe high-end gamers. In theory it’s using similar technology to SSDs, but they’re able to achieve significantly higher bandwidth and IOPs than has been achieved in an SSD that is focused on stuffing flash into a conventional SATA hard drive form factor.
Took a trip for work back in December to provide cluster training for some of the engineers at Gulfstream. Seems like we also did some very minor maintenance on the cluster we administer there. Anyway, I brought the camera along and had a chance to take a few pics while wandering around the riverfront. Only got around to pulling them off my camera today, so here they are:
We’ve been here in Alabama for 2 weeks now. Things seem to be going pretty well so far. We somehow survived the move and I don’t think left anything too critical behind. The truck full of half our stuff (and pulling the Civic), the CRV full of cats and guitars, and all the occupants got down here safely and relatively uneventfully. The POD full of the other half of our stuff arrived this past Thursday and we’re slowly working on unloading it.
Amanda and I are buying a new house. In a land far far from CU… Huntsville, AL to be exact… or technically Madison, AL to be even more exact (suburb to the west of Huntsville). No, Derek didn’t graduate. While one could ask “why is he leaving sans-degree”, a better question might be “why is he referring to himself in the third person?” To put it simply, I just got fed up with how things have been going (or not going to be more precise) and decided it was time for a change.
Heard this great quote from the late/great Seymour Cray: “If you were plowing a field, which would you rather use: Two strong oxen or 1024 chickens?” Apparently he said this in response to a question about the growing use of large clusters of commodity PCs for supercomputing applications. Traditional wisdom would suggest it’s much better to design the heck out of two very powerful processors and ease the parallel programming burden to achieve high performance.