Batch Job Submission ==================== As we discussed before, on Lonestar6 there are login nodes and compute nodes. .. image:: ./images/hpc_schematic.png :target: ./images/hpc_schematic.png :alt: HPC System Architecture We cannot run the applications we need for our research on the login nodes because they require too many resources and will interrupt the work of others. Instead, we must write a short text file containing a list of the resources we need, and containing the command(s) for running the application. Then, we submit that text file to a queue to run on compute nodes. This process is called **batch job submission**. After going through this module, students should be able to: * Find and assemble input data needed for a docking calculation * Load the appropriate docking software into your PATH * Assemble a SLURM batch submission script * Submit a job, monitor the job progress, and find the job outputs Lonestar6 Queues ---------------- There are several queues available on Lonestar6. It is important to understand the queue limitations and pick a queue that is appropriate for your job. Documentation can be found `here `_. Today, we will be using the ``development`` queue which has a max runtime of 2 hours, and users can only submit one job at a time. .. image:: ./images/lonestar6_queues.png :target: ./images/lonestar6_queues.png :alt: Lonestar6 Queues Assemble Job Inputs ------------------- Our objective is to perform a molecular docking calculation. This is a computational technique used to predict the binding orientation and energetics of (typically) a small molecule ligand within a protein binding site. We will be using the software package called ``autodock_vina`` to perform this calculation. As inputs, we need a structure file containing the protein, a structure file containing the ligand, and a configuration file with parameters for the docking calculation. Finally, we also need a SLURM batch submission script to tell the compute node how to run the job. .. image:: ./images/autodock.png :target: ./images/autodock.png :alt: Autodock Output (Visualized in UCSF Chimera) First, navigate to your $WORK directory where we can organize everything we need for a new job. .. code-block:: console # Navigate to $WORK, make new directory to organize this example [ls6]$ cdw [ls6]$ mkdir docking-example && cd docking-example # Copy some prepared files [ls6]$ cp /work/03439/wallen/public/autodock_vina_example/data/protein.pdbqt ./ [ls6]$ cp /work/03439/wallen/public/autodock_vina_example/data/ligand.pdbqt ./ [ls6]$ cp /work/03439/wallen/public/autodock_vina_example/data/configuration_file.txt ./ # Create a new file for the SLURM script and add the template below [ls6]$ touch job.slurm Open ``job.slurm`` with your favorite Linux text editor and add the content below: .. code-block:: text #!/bin/bash #---------------------------------------------------- # Example SLURM job script to run applications on # TACCs Lonestar6 system. #---------------------------------------------------- #SBATCH -J # Job name #SBATCH -o # Name of stdout output file #SBATCH -e # Name of stderr error file #SBATCH -p # Queue (partition) name #SBATCH -N # Total # of nodes (must be 1 for serial) #SBATCH -n # Total # of mpi tasks (should be 1 for serial) #SBATCH -t # Run time (hh:mm:ss) #SBATCH -A # Project/Allocation name (req'd if you have more than 1) # Everything below here should be Linux commands .. note:: The template job.slurm file above is incomplete. It will not work without filling in values for each of the SLURM directives. EXERCISE ~~~~~~~~ Spend some time examining the content of each file. How do you think these inputs were prepared? What considerations go into choosing the right queue and the wall-clock time? What else is missing from above before we can run the job? Write the SLURM Script ~~~~~~~~~~~~~~~~~~~~~~ Next, we need to fill out ``job.slurm`` to request the necessary resources. If you have some prior experience with ``autodock_vina``, you may be able to reasonably predict how much time and resources we will need. When running your first jobs with your applications, it will take some trial and error, and reading online documentation, to get a feel for how many resources you should use. Open ``job.slurm`` with your preferred text editor and fill out the following information: .. code-block:: console #SBATCH -J vina_job # Job name #SBATCH -o vina_job.o%j # Name of stdout output file (%j expands to jobId) #SBATCH -e vina_job.e%j # Name of stderr error file (%j expands to jobId) #SBATCH -p development # Queue (partition) name #SBATCH -N 1 # Total # of nodes (must be 1 for serial) #SBATCH -n 1 # Total # of mpi tasks (should be 1 for serial) #SBATCH -t 00:10:00 # Run time (hh:mm:ss) #SBATCH -A OTH24028 # Project/Allocation name (req'd if you have more than 1) Now, we need to provide instructions to the compute node on how to run ``autodock_vina``. This information would come from the ``autodock_vina`` instruction manual. Continue editing ``job.slurm`` and add this to the bottom: .. code-block:: console # Everything below here should be Linux commands echo "starting at:" date module list module use /work/03439/wallen/public/modulefiles module load autodock_vina/1.2.3 module list cd data/ vina --config configuration_file.txt --out output_ligands.pdbqt echo "ending at:" date The way this job is configured, it will print a starting date and time, load the appropriate modules, run ``autodock_vina``, write output to the same directory, then print the ending date and time. Keep an eye on the current directory for output. Once you have filled in the job description, save and quit the file. Submit a Job ------------ Submit the job to the queue using the ``sbatch`` command`: .. code-block:: console [ls6]$ sbatch job.slurm To view the jobs you have currently in the queue, use the ``showq`` or ``squeue`` commands: .. code-block:: console [ls6]$ showq -u [ls6]$ showq # shows all jobs by all users [ls6]$ squeue -u $USERNAME [ls6]$ squeue # shows all jobs by all users If for any reason you need to cancel a job, use the ``scancel`` command with the 6- or 7-digit jobid: .. code-block:: console [ls6]$ scancel jobid For more example scripts, see this directory on Lonestar6: .. code-block:: console [ls6]$ ls /share/doc/slurm/ If everything went well, you should have an output file named something similar to ``vina_job.o864828`` in the same directory as the ``job.slurm`` script. And, you should have some output: .. code-block:: console [ls6]$ cat vina_job.o864828 # closely examine output [ls6]$ ls output_ligands.pdbqt output_ligands.pdbqt [ls6]$ cat output_ligands.pdbqt # closely examine output Examine all output to make sure you understand what it represents and if everything was a success. EXERCISE ~~~~~~~~ 1. Create a new directory for a new job. (Staying organized is key!) 2. Copy over inputs from second autodock vina example: .. code-block:: console [ls6]$ cp -r /work/03439/wallen/public/autodock_vina_example_2/* ./ 3. Browse the copied files to make sure you understand what is present. 4. Write a new SLURM batch submission script to dock all ligands to the new receptor. I suggest using the appropriate flag to make vina write output to a sub folder to stay organized. 5. Given what you know about the timing of one job, consider how long you will need for this job. 6. Submit the job and monitor the output to make sure you get all the results. 7. Question: Which molecule does autodock vina predict binds the strongest to the target protein? .. tip:: Refer to the `AutoDock Vina ReadTheDocs ` for tips on using the command, or run ``vina --help``. .. tip:: Consider playing around with the parameters in the configuration file to see how you can speed up the calculation. In particular the ``cpu`` Additional Resources -------------------- * `TACC Docs `_ * `Lonestar6 Docs `_ * `AutoDock-Vina `_