System Stress Tests
June 1, 2016
sysadmin
hpc
raid
I recently received a question about SGI’s pandora after someone found my
run-pandora.sh
script in my hpc-admin-scripts
repo. They were looking
for a way to test a server with a fair bit of memory in a short amount of time.
They’d tried Memtest86 and found it to be incredibly slow when running
single-threaded or proved too unstable when running on all cores. When they
found my repo, they figured it’d be worth asking about pandora in the hopes it
would be appropriate for their needs. While I believe pandora is an SGI
proprietary tool, I was able to direct them to some more generic alternatives.
Since this info may be valuable to others, I figured I’d go ahead and capture
it here (and add to this list if I remember any others I’ve used).
I hadn’t seen stress-ng before, but it sounds interesting based on the brief blog post.
Note: In my experience, stress tests like these will not necessarily crash a server-class machine due to the redundancy provided by technology like ECC, RAID, etc. I would typically run tools like these and monitor system logs for any signs of flakey hardware, issue scrub commands for RAID arrays, and especially check for any corrected memory errors via EDAC or MCEs.