Sun Grid Engine (SGE) array of jobs

Why job arrays?

Suppose you wish to run a large number of largely identical jobs: you may wish to run the same program many times with different arguments or parameters; or perhaps process a thousand different input files. One might write a Perl script to generate all the required qsub files and a BASH script to submit them all. However this is not a good use of your time and it will do horrible things to the submit (login) node on a cluster. Much better to use an SGE array job!

What is job array?

  • An SGE array job might be described as a job with a for-loop built in. Here is a simple example:
    
    #!/bin/bash
            
    #$ -cwd
    #$ -S /bin/bash
            
    #$ -t 1-1000
    # ...tell SGE that this is an array job, with "tasks" 
    #     numbered from 1 to 10000...
            
    ./myprog <  data.$SGE_TASK_ID > results.$SGE_TASK_ID
    
    Computationally, this is equivalent to 1000 individual queue submissions in which SGE_TASK_ID takes the values 1, 2, 3. . . 1000, and where input and output files are indexed by the ID. However:
    • only one qsub command is issued (and only one qdel command would be required to delete all jobs);
    • only one entry appears in qstat output;
    • the load on the SGE submit node (i.e., the cluster login node) is vastly less than that of submitting 1000 separate jobs!
    A slight variation — run each job in a separate directory (folder):
    
    #!/bin/bash
            
    #$ -cwd
    #$ -S /bin/bash
            
    #$ -t 1-1000
            
    mkdir myjob-$SGE_TASK_ID
    cd myjob-$SGE_TASK_ID
    ../myprog-one > one.output
    ../myprog-two < one.output > two.output
    

A more general for loop

  • It is not necessary that SGE_TASK_ID starts at 1; nor must the increment be 1. For example:
    
    #$ -t 100-995:5
    
    so that SGE_TASK_ID takes the values 100, 105, 110, 115... 995. Incidentally, in the case in which the upper-bound is not equal to the lower-bound plus an integer-multiple of the increment, for example:
    
    #$ -t 1-42:6
    
    SGE automatically changes the upper bound, viz
    
    prompt> qsub array.qsub
    Your job-array 2642.1-42:6 ("array.qsub") has been submitted
            
    prompt> qstat
    job-ID   prior  name            user  state     submit/start at queue   slots ja-task-ID 
    ---------------------------------------------------------------------------------------
    2642  0.00000  array.qsub  simonh       qw   04/24/2009 12:29:29        1 1-37:6
    

Related environment variables

  • There are three more automatically created environment variables one can use, as illustrated by this simple qsub script:
    
    #!/bin/bash
            
    #$ -cwd 
    #$ -S /bin/bash
            
    #$ -t 1-37:6
            
    echo "The ID increment is: $SGE_TASK_STEPSIZE"
            
    if [[ $SGE_TASK_ID == $SGE_TASK_FIRST ]]; then
      echo "first"
    elif [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then
      echo "last"
    else
      echo "neither"
    fi
    

A list of input files

  • One can be sneaky — suppose we have a list of input files, rather than input files explicitly indexed by suffix:
    
    #!/bin/bash
            
    #$ -cwd
    #$ -S /bin/bash
            
    #$ -t 1-42
            
    $INFILE=`awk "NR==$SGE_TASK_ID" my_file_list.text`
    #
    # ...or used sed:       
    #       sed -n "${SGE_TASK_ID}p" my_file_list.text
    #
            
    ./myprog < $INFILE
    

More about SGE job arrays

More on SGE job arrays can be found at: Wiki GridEngine page.