Welcome Skills Experience Education Projects Blog

Crunch

Dr. Gillian Wilson, one of my thesis committee members, was kind enough to provide me with funding to build a workstation for my project. I needed a fast shared memory machine with at least 32 GB of RAM in order to run Sunrise efficiently. After configuring several systems on the HP, Dell, Apple, and other websites, I discovered that it would be considerably more cost-effective if I built the machine myself. Ideally, I would have built a system consisting of 2 or 4 of Intel’s latest Xeon processors and an Nvidia Tesla c2050 card, but I tried to keep the price as low as possible for the level of performance necessary. I eventually obtained the following combination of components:


Motherboard:	Asus KGPE-D16 Dual Socket G34 AMD SR5690 SSI EEB 3.61
Processors:	2x AMD Opteron 6172 Magny-Cours 2.1 GHz (24 cores total)
Memory:	16x 4 GB DDR3 1333 unregistered DRAM (64 GB total)
Graphics:	EVGA Nvidia GeForce GTX 580 (containing 512 CUDA cores and 1.5 GB GDDR5 )
System disk:	64 GB Crucial RealSSD C300
Data disks:	2x 1.0 TB Western Digital Caviar Black WD1002FAEX
Chassis:	Intel 5U Server Chassis SC5650WSNA, with 1000W PSU

Note 1: The Magny-Cours Opteron processors are slower per clock cycle than the current Intel Xeons, however (1) the performance of the Magny-Cours systems scales better with core count than Xeon-based machines, (2) the codes that I will be using scale rather well with thread count, (3) the Opterons are considerably cheaper for a given performance level, and (4) The Magny-Cours Opteron processors allowed me to use a large quantity of inexpensive unregistered memory modules. The only Xeons that support this amount of unregistered memory are approximately twice as expensive as the Opterons. To summarize, I could have built an equivalent machine with fewer Xeon cores running at a higher clock rate, but such a system would have been considerably more expensive.

Note 2: The GTX 580 graphics card is typically faster than the Tesla c2050 for CUDA as long as the problem can fit in its smaller memory. The primary deficiency of the GTX is that it was not expressly designed for sustained HPC workloads, and thus it is likely to fail sooner under heavy workloads. On the other hand, one could buy four GTX 580 cards for less than the price of one Tesla c2050.

I’ve tentatively named the system “Crunch”. It’s running Ubuntu 10.04.2 LTS with the 2.6.38-rc4 Linux kernel. I’ll likely upgrade to the final 2.6.38 kernel when it’s released and keep the kernel version unchanged until I finish my Ph.D. work. I have installed all of the development tools, compilers, and libraries that I think I’ll need, including the latest Nvidia driver and CUDA Toolkit. I have successfully compiled and run sample CUDA programs from the CUDA SDK, as well as my qualifier project code, the ray-tracing code POV-Ray 3.7, and benchmarks from the Phoronix Test Suite. I also installed Apache2, PHP5, MySQL, and many modules so that the system can be used as a web server. The latest version of WordPress has been installed on the web server. The server could potentially be used for data-sharing.

Here are the outputs of bandwidthTest, deviceQuery, hwinfo, lshw, lsmod, lspci, and lsusb.

For a listing of all software installed from the Ubuntu repositories (i.e. all pre-compiled packages), click here.

Here’s a screenshot from the machine’s first parallel code execution:

Running my qualifier project code remotely and viewing the gnome-system-monitor using `ssh -X`.

Some images taken during the assembly process: