Hummingbird (Formerly Campus Rocks)

What is it?

Originally Craig's List donated a bunch of computers to BSOE in the summer of 2009. We built a computer cluster out of it, and are allowing students, staff and faculty at UCSC to use the cluster . That original hardware is now retired.

The new cluster consists of one head node, and 7 compute nodes with a total of 587 cores to execute processes on. There are 4 queues:

      • all.q nodes with AMD CPUs for long running jobs.
      • small.q node/nodes with AMD CPUs for shorter runningjobs (limited to 72 hrs).
      • 256-44i.q node/nodes with 256GB RAM 44core Intel processors.
      • 128-24i.q nodes with 128GB RAM 24core Intel processors.

Individuals may run 40 jobs/cores at anytime, but queue up as many as they need within reason.

How do I log in?

Hummingbird uses Cruzid Blue to login. To change your password, visit the CruzID Manager web page. Once you have a password, you can log in to the system via SSH at hummingbird.ucsc.edu.

How do I use the cluster (submit, cancel, review jobs)?

We are using SunGrid SGE and its documentation is available on the cluster:

http://hummingbird.ucsc.edu/roll-documentation/sge/6.2/

The above URL is NOT accessible outside of the UCSC IP space.

qsub Submits a job (create a shell script, then run qsub shellscript)
qdel Delete a job
qlogin Interactive login
qstat See the status of jobs in the queue
qmon GUI

How can I run MPI jobs on it?

Compile the code with mpicc

/opt/openmpi/bin/mpicc

sample c code is in /opt/mpi-tests/src

And then a shell script like this (mpitest16):

#!/bin/csh 
unsetenv SGE_ROOT
/opt/openmpi/bin/mpirun -np 16 -machinefile $TMPDIR/machines /opt/mpi-tests/bin/mpi-ring

Finally submitted to the sungrid queue like this:

% qsub -pe mpi 16 mpitest16

How can I see how busy the cluster is?

qhost shows the load averages of each of the exec hosts

qstat -g c gives a count of number of jobs running on each queue

You may also use the hummingbird.ucsc.edu URL to access a number of links to graphs. The latest resource is the "PHPQstat Graph" link. This graph lists the queues, cores assigned to a queue, and cores being used/available. The "Ganglia Graphs" link provides node, and overall cluster usage.

What is the Small Queue, what are the limitations?

The Small queue is for jobs that will not run for a long time, there is a 72 hour wall clock limit and and 800 hour CPU limit (if you do multi-threaded operations) you can see queue configurations with the command qconf -sq small.q

  • The small.q currently has 1 box dedicated with 48 processors.
  • The all.q has 4 computers with a total of 240 processors.
  • The intel.q has one computer with a total of 44 multi threaded processors.

How do I load Software on it?

We will load RPMs that are in the yum repository for the OS we are running, or you can compile code yourself in your home directory.

The best method for placing the needed software, or files/data into your home directory on the cluster is to access the storage unit via 'sftp'. This is done via "sftp campusrocks-store-01.soe.ucsc.edu". This will allow you to move the necessary data without adding to overhead on the cluster head node

If you have questions about a package (known RPMs), or file transfer method/procedure please put in an ITRequest ticket .

Are there backups?

We keep seven daily snapshots of your home directory. Look in /campusdata/.zfs/snapshot/ to find the snapshots. From there, you can simply copy files back to your home directory as needed. The Hummingbird storage unit is currently backed-up to an off-site external storage unit. This allows for data recovery should the main storage unit experience a catastrophic failure.

Feedback from Campus Rocks Users

Jonathan Magasin

Campusrocks has been invaluable for my bioinformatics research with marine metagenomics data. The cluster has enabled me to investigate new ways of assembling and annotating 40-50 of these large datasets, with great speed (both due to fast cores, lots of memory, and parallelization) and reliable backup of scripts and results. I could not have done the same experiments in a reasonable time on my laptop, which would have been unusable for other research tasks had I tried. Finally, the cluster computing skills I have developed by working on campusrocks — my first such experience — will be essential for my bioinformatics work after graduate school. Thanks for maintaing such an important resource!

Adam Millard-Ball Assistant Professor, Environmental Studies Department

I use Campus Rocks to use the multi-core version of Stata that is installed, and for computational-intensive work in Python (usually estimation of statistical models). Let me know if you need more details.

Tia Plautz

I use the cluster for my research in medical imaging. I run simulations in a program called Geant4 which simulates particle interactions in an imaging system. Each simulation requires modeling the behavior and interactions of approximately 200 million proton events, and requires at least 90 cores, therefore, there is no other resource on campus that allows me to do these simulations in a timely manner. Likewise, it is essential to my work that the cluster function efficiently since time is of the essence. Some of the machines, namely (02 and 04) function at 1/3-1/5 the speed of some of the other machines which is extremely frustrating.

Eric Carlson

I use this cluster for parallelized monte carlo simulations of high energy particle physics processes, especially related to dark matter. I also utilize the cluster for simulating and processing large sets of gamma-ray data in order to search for astrophysical signatures of dark matter. A significant expansion of these resources would be of great value to UCSC's research programs.

Duncan McColl

I am an undergraduate working with Dr. Camps in METX. My cluster utilization involved analyzing co-variation in cancer databases, namely cbioportal, to provide functional context clues for an orphan gene. Proper analysis requires using the entire genome as a query set, which can be computationally intensive. CampusRocks is a great resource, thanks for your work.

Cameron Pye

I am a graduate student in Scott Lokey's lab. We use the Campus Rocks cluster for running molecular dynamics simulations on virtual libraries consisting of thousands of members. While each individual simulation is fairly brief and computationally inexpensive, the numbers mandate parallelism. The campus rocks cluster provides a wonderful and free resource for running these simulations. We greatly value its functionality and will continue to use it in whatever capacity we can.

Jevgenij Raskatov

campusrocks has been a tremendously helpful resource for me this year (I only recently joined the UCSC faculty). I have used the system to run DFT calculations and model molecular reactivity, photophysical properties, as well as NMR chemical shifts. In future, and as my lab continues to grow, we will furthermore be conducting bioinformatics analyses (ChIP-seq and RNA-seq). One of my research interests is on the transcription factor NF kappa B. Continued access to the cluster will play a highly important role for my research.

Chad Saltikov

I use the cluster for assembling genome sequence data from bacteria we isolate in extreme environments high in arsenic. Part of my the research we do in my lab involves isolating and characterizing new bacterial species that can grow on the toxic metal, arsenic, which is naturally occurring and at high levels in places like Mono Lake, CA and other soda lakes in Nevada. The cluster is essential to the genome assembly process because I need a computer system with a lot of power. The programs I use work so much better on the cluster. It's been really nice to have this service available.