添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接

Current version of the intro to the DCC slides .

Introduction to SLURM

Most DCC partitions are lab-owned machines. These can only be used by members of the group. Submitting to a group partition gives “high-priority”.

Submit to partitions with

sbatch –p (partition name) --account=(account name)

(in a job script) or

srun –p (partition name) --account=(account name) --pty bash –i

(interactively) In general, the partition name and account name will be the same for most lab-owned machines.

Common DCC Partitions

There are four different DCC partitions to which batch jobs and interactive sessions can be directed by all DCC users:

  • common for jobs that will run on the DCC core nodes (up to 64 GB RAM).
  • gpu-common for jobs that will run on DCC GPU nodes.
  • scavenger for jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption).
  • scavenger-gpu for GPU jobs that will run on lab-owned nodes in “low priority” (kill and requeue preemption).
  • Note: These partitions do not require an “–account=” flag. (i.e., they accept jobs from any account.) If a partition is not specified, the default partition is the common partition.

    Running an interactive job

    Reserve a compute node by typing srun –pty bash -i

    tm103@dcc-login-02 ~ $ srun --pty bash -i   
    srun: job 186535 queued and waiting for resources   
    srun: job 186535 has been allocated resources   
    tm103@dcc-core-11 ~ $
    tm103@dcc-core-11 ~ $ squeue -u tm103  
    JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 
    186535 common bash tm103 R 0:14 1 dcc-core-11
    

    I now have an interactive session in the common partition on node dcc-core-11

    SLURM commands

    sbatch – Submit a batch job

    #SBATCH – Specify job parameters in a job script

    squeue – Show lists of jobs

    scancel – Delete one or more batch jobs

    sinfo – Show info about machines

    scontrol – Show cluster configuration information

    Use sbatch (all lower case) to submit text file job scripts, e.g. test.sh

    sbatch test.sh
    

    Use #SBATCH (upper case) in your scripts for scheduler directives, e.g.

    #SBATCH --mem=1G
    #SBATCH --output=matlab.out
    

    All SLURM directives can be given on the command line instead of the script. slurm.schedmd.com

    Slurm memory directives

    The default memory request (allocation) is 2 GB RAM. This is a hard limit, always request a little more. To request a total amount of memory for the job, use one of the following: * --mem=<MB> additional memory * --mem=<Gigabyte>G the amount of memory required per node, or * --mem-per-cpu=<MB> the amount of memory per CPU core, for multi-threaded jobs

    Note: –mem and –mem-per-cpu are mutually exclusive

    Slurm parallel directives

    All parallel directives have defaults of 1

    -N <number> How many nodes (machines)

    -n <number> or --ntasks=<number> How many parallel jobs (“tasks”)

    -c <number> or --cpus-per-task=<number>

    Use -n and -N for multi-node jobs (e.g. MPI)

    Use -c (–cpus-per-task) for multi-threaded jobs

    Job script examples

    #!/bin/bash  
    #SBATCH ---output=test.out
    hostname # print hostname
    

    This prints the name of the compute node in the file “test.out”

    tm103@dcc-login-02 ~/slurm $ sbatch simple.sh
    Submitted batch job 186554tm103@dcc-login-02 ~/slurm $ cat test.out 
    dcc-core-14
    

    Long-form commands example

    #SBATCH --output=slurm.out
    #SBATCH --error=slurm.err  
    #SBATCH --mem=100 # 100 MB RAM 
    #SBATCH --partition=scavenger# 
    hostname 1>&2  #prints hostname to the error file
    

    This job will run in low priority on a lab node in the “scavenger” partition

    Short-form commands example.

    SLURM short commands don’t use “=“ signs

    #!/bin/bash
    #SBATCH -o slurm.out
    #SBATCH -e slurm.err
    #SBATCH --mem=4G # 4 GBs RAM 
    #SBATCH –p scavenger
    hostname 1>&2  #prints hostname to the error file
    

    R example script

    #!/bin/bash
    #SBATCH –e slurm.err
    #SBATCH --mem=4G # 4 GB RAM 
    module load R/3.6.0
    R CMD BATCH Rcode.R
    

    This loads the environment module for R/3.6.0 and runs a single R script (“Rcode.R”)

    The #SBATCH –mem=4G requests additional RAM

    Multi-threaded (multi-core) example

    !/bin/bash
    #SBATCH –J test 
    #SBATCH –o test.out
    #SBATCH –c 4
    #SBATCH –-mem-per-cpu=500 #(500 MB) 
    myApplication –n $SLURM_CPUS_PER_TASK
    

    The value of $SLURM_CPUS_PER_TASK is the number after -c. This example starts a single, multi-threaded job that uses 4 CPU cores and 2 GB (4x500MB) of RAM

    OpenMP multicore example

    !/bin/bash
    #SBATCH –J openmp-test 
    #SBATCH –o slurm.out
    #SBATCH –c 4
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    myOpenMPapp # will run on 4 CPU cores
    

    This sets $OMP_NUM_THREADS to the value of $SLURM_CPUS_PER_TASK

    Slurm job arrays

    Slurm job arrays are a mechanism for submitting and managing collections of similar jobs using one job script and one application program.

  • Add --array or -a option to the job script
  • Each job task will inherit a SLURM_ARRAY_TASK_ID environment variable with a different integer value
  • Each job array can be up 100,000 job tasks on the DCC
  • Job arrays are only supported for batch jobs
  • Job array “tasks” must be independent: slurm.schedmd.com/job_array.html
  • For example, in a job script, add the line #SBATCH --array=1-30 or, alternatively, #SBATCH -a 1-30 to submit 30 job tasks. The job array indices can also be specified on the command line, e.g.

    sbatch -a 1-30 myjob.sh
    

    The index values can be continuous, e.g.

    -a 0-31 (32 tasks, numbered from 0,1,2,…,31)
    

    or discontinuous, e.g.

    -a 3,5,7-9,12 (6 tasks, numbers 3,5,7,8,9,12)
    

    It can also be a single job task, e.g.

    The discontinous notation is useful for resubmitting specific job tasks that had previously failed. Each job task is assigned the enviromental variable $SLURM_ARRAY_TASK_ID set to it’s index value.

    tm103@dcc-login-02  ~/misc/jobarrays $ cat array-test.sh 
    #!/bin/bash
    echo $SLURM_ARRAY_TASK_ID
    tm103@dcc-login-02  ~/misc/jobarrays $ sbatch -a 1-3 array-test.sh 
    Submitted batch job 24845830
    tm103@dcc-login-02  ~/misc/jobarrays $ ls slurm-24845830*
    slurm-24845830_1.out  slurm-24845830_2.out  slurm-24845830_3.out
    tm103@dcc-login-02  ~/misc/jobarrays $ cat slurm-24845830*
    tm103@dcc-login-02  ~/misc/jobarrays $
    

    Python job array example

    #!/bin/bash
    #SBATCH -e slurm_%A_%a.err
    #SBATCH -o slurm_%A_%a.out 
    #SBATCH --array=1-5000 
    python myCode.py
    $ cat test.py
    import os
    taskID=int(os.environ['SLURM_ARRAY_TASK_ID'])
    

    Start 5000 Python jobs, each with a different “taskID”, initialized from $SLURM_ARRAY_TASK_ID

    Importing Environmental Variables

    Python:

    numCPUs=int(os.environ['SLURM_CPUS_PER_TASK'])
    taskID=int(os.environ['SLURM_ARRAY_TASK_ID'])
    
    numCPUs <- as.integer(Sys.getenv(SLURM_CPUS_PER_TASK)) 
    taskID <- as.integer(Sys.getenv(SLURM_ARRAY_TASK_ID))
    

    MATLAB:

    numCPUs = str2num(getenv('SLURM_CPUS_PER_TASK'))
    taskID = str2num(getenv('SLURM_ARRAY_TASK_ID'))
    

    Processing separate input files

    Process an existing file list, e.g. files.txt

    #!/bin/bash
    readarray -t FILES < files.txt
    FILENAME=${FILES[(($SLURM_ARRAY_TASK_ID - 1))]}
    myapp $FILENAME
    

    Dynamically generate a file list from “ls”

    #!/bin/bash
    export FILES=($(ls -1 myfile*))
    FILENAME=${FILES[(($SLURM_ARRAY_TASK_ID - 1))]}
    myapp $FILENAME
    

    Example: Using the taskID as part of the file name and output directory for the case with input file names of the form input1,input2,…,inputN for -a 1-N, e.g.

    #!/bin/bash
    #SBATCH -e slurm_%A_%a.err
    #SBATCH -o slurm_%A_%a.out      
    mkdir out_${SLURM_ARRAY_TASK_ID}     
    cd out_${SLURM_ARRAY_TASK_ID}
    myapp ../input_${SLURM_ARRAY_TASK_ID}.txt
    

    where output directories out1, out2, … are created for input files input1.txt, input2.txt,…

    “Unrolling” for loops example

    Original “serial” code (Python)

    fibonacci = [0,1,1,2,3,5,8,13,21]
    for i in range(len(fibonacci)):
        print(i,fibonacci[i])
    

    Job array version

    import os
    i=int(os.environ['SLURM_ARRAY_TASK_ID'])
    fibonacci = [0,1,1,2,3,5,8,13,21]
    #for i in range(len(fibonacci)):
        print(i,fibonacci[i])
    

    where the for loop is commented-out and each job task is doing a single “iteration”

    tm103@dcc-login-02  ~/misc/jobarrays $ cat fib-array.sh 
    #!/bin/bash
    #SBATCH -e slurm.err
    module load Python/2.7.11
    python fibonacci.py
    tm103@dcc-login-02  ~/misc/jobarrays $ sbatch –a 1-8 fib-array.sh 
    Submitted batch job 24856052
    tm103@dcc-login-02  ~/misc/jobarrays $ ls slurm-24856052_*
    slurm-24856052_1.out  slurm-24856052_3.out  slurm-24856052_5.out  slurm-24856052_7.out
    slurm-24856052_2.out  slurm-24856052_4.out  slurm-24856052_6.out  slurm-24856052_8.out
    tm103@dcc-login-02  ~/misc/jobarrays $ cat slurm-24856052*
    (1, 1)
    (2, 1)
    (3, 2)
    (4, 3)
    (5, 5)
    (6, 8)
    (7, 13)
    (8, 21)
    tm103@dcc-login-02  ~/misc/jobarrays $
    

    Running MPI jobs

    Supported MPI versions are Intel MPI and OpenMPI

    Compiling with OpenMPI

    tm103@dcc-login-02  ~ $ module load OpenMPI/4.0.5-rhel8
    OpenMPI 4.0.5-rhel8
    tm103@dcc-login-03  ~ $ mpicc -o openhello hello.c
    tm103@dcc-login-02  ~ $ ls -l openhello 
    -rwxr-xr-x. 1 tm103 scsc 9184 Sep  1 16:08 openhello`
    

    OpenMPI job script

    #!/bin/bash
    #SBATCH -o openhello.out
    #SBATCH -e slurm.err
    #SBATCH -n 20
    module load OpenMPI/4.0.5-rhel8
    mpirun -n $SLURM_NTASKS openhello
    

    OpenMPI example output

    tm103@dcc-login-02 ~/misc/slurm/openmpi $ cat openhello.out  
    dcc-core-01, rank 0 out of 20 processors  
    dcc-core-01, rank 1 out of 20 processors  
    dcc-core-01, rank 2 out of 20 processors  
    dcc-core-01, rank 3 out of 20 processors  
    dcc-core-01, rank 4 out of 20 processors   
    dcc-core-03, rank 13 out of 20 processors 
    dcc-core-03, rank 14 out of 20 processors 
    dcc-core-03, rank 10 out of 20 processors 
    dcc-core-03, rank 11 out of 20 processors 
    dcc-core-03, rank 12 out of 20 processors 
    dcc-core-02, rank 8 out of 20 processors 
    dcc-core-02, rank 9 out of 20 processors 
    dcc-core-02, rank 5 out of 20 processors
    

    GPU jobs

    To run a GPU batch job, add the job script lines

    #SBATCH -p gpu-common 
    #SBATCH --gres=gpu:1 
    #SBATCH --exclusive
    

    To get an interactive GPU node session, type the command line srun -p gpu-common --gres=gpu:1 --pty bash -i tm103@dcc-clogin-02 ~ $ srun -p gpu-common --gres=gpu:1 --pty bash -i tm103@dcc-gpu-01 ~ $ /usr/local/cuda-7.5/samples/1_Utilities/deviceQuery/deviceQuery Detected 1 CUDA Capable device(s)

    Device 0: "Tesla K80"
        CUDA Driver Version / Runtime Version          7.5 / 7.5
        CUDA Capability Major/Minor version number:    3.7
        Total amount of global memory:                 11520 MBytes (12079136768 bytes)
        (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
    

    NEW: GPU types (coming 10/1/2022)

    Request a specific GPU type and GPU device number with

    #SBATCH --gres=gpu:(GPU type):(GPU number)
    

    where "GPU number" was a number from 1 to 4 and "GPU type" was from the list below.

    RTX6000

    TITANX

    GTX1080

    V100 (16 GB GPU RAM)

    V10032 (32 GB GPU RAM)

    RTXA5000

    RTX2080 (Currently all core-gpu nodes and vast majority of scavenger-gpu nodes are these.)

    Job dependencies

    Submit a job that waits for another job to finish.

    $ sbatch dep1.q
    Submitted batch job 666898
    

    Make a note of the assigned job ID of dep1

    $ sbatch --dependency=afterok:666898 dep2.q
    Job dep2 will not start until dep1 finishes
    

    Job dependencies with arrays

    Wait for specific job array elements

    sbatch --depend=after:123_4 my.job sbatch --depend=afterok:123_4:123_8 my.job2
    

    Wait for entire job array to complete

    sbatch --depend=afterany:123 my.job
    

    Wait for entire job array to complete successfully

    sbatch --depend=afterok:123 my.job
    

    Wait for entire job array to complete and at least one task fails

    sbatch --depend=afternotok:123 my.job
    

    More information: hcc-docs.unl.edu/display/HCCDOC/Job+Dependencies

    Additional Resources

    http://schedmd.com