System Stress Tests

June 1, 2016
sysadmin hpc raid

I recently received a question about SGI’s pandora after someone found my run-pandora.sh script in my hpc-admin-scripts repo. They were looking for a way to test a server with a fair bit of memory in a short amount of time. They’d tried Memtest86 and found it to be incredibly slow when running single-threaded or proved too unstable when running on all cores. When they found my repo, they figured it’d be worth asking about pandora in the hopes it would be appropriate for their needs. While I believe pandora is an SGI proprietary tool, I was able to direct them to some more generic alternatives. Since this info may be valuable to others, I figured I’d go ahead and capture it here (and add to this list if I remember any others I’ve used).

I hadn’t seen stress-ng before, but it sounds interesting based on the brief blog post.

Note: In my experience, stress tests like these will not necessarily crash a server-class machine due to the redundancy provided by technology like ECC, RAID, etc. I would typically run tools like these and monitor system logs for any signs of flakey hardware, issue scrub commands for RAID arrays, and especially check for any corrected memory errors via EDAC or MCEs.