THIS PAGE HAS MOVED TO https://help.genouest.org !!!
- How to create an account on our infrastructure
- Lost password
- How to connect to the cluster
- Some information about the storage
- How to recover erased files
- How to use the Genostack cloud
- Some information about databanks
- Some information about available software
- How to use Conda to install new software
- How to use Virtualenv to install new python modules
- How to launch jobs on the Slurm cluster
- How to use Nvidia GPUs on the Slurm cluster
- How to use X forwarding
- How to transfer data with FTP
- How to run pre-designed workflows with Nextflow
- How to use Go-Docker
- How to use Singularity
- How to use cloud storage
- How to run a Jupyter notebook
- How to create a MySQL database
- How to contact the support
- How to cite GenOuest
Request for an account
You can send your request for an account on this page.
Your request will be examined and validated by GenOuest. Accounts are presently free for the academic community but things will change in the next months since the operation and maintenance of our infrastructure has a non negligeable cost that we can not support easily.
Your account will allow you to access to the different services offered by GenOuest : computing, Galaxy portal, collaborative tools, project management, instant communication, data sharing. For more information on collaborative tools see the CeSGO page.
Please note that the my.genouest service can also help you in the password retrieval process if you forget it.
If you lose your password, just point your browser to my.genouest.org, enter your login and click on “RESET”. You will receive the instructions by mail.
Connection to the cluster
Once you have an account on the platform, you can connect via SSH to the genossh.genouest.org server.
genossh.genouest.org is a front-end of the cluster from which you can submit jobs with the Slurm job manager. You first need to connect to the front-end server via SSH from your computer.
You can connect to genossh.genouest.org from anywhere, but only with a properly configured SSH Key.
Connecting to genossh.genouest.org from a windows computer
On Windows, Putty can be used to load SSH keys and connect via SSH to the cluster. Have a look at this video tutorial explaining the whole procedure (creating a SSH key and then connecting to the cluster):
Connecting to genossh.genouest.org from a linux computer
You need first to generate an SSH key. To do so, launch this command on your computer:
ssh-keygen -t rsa -b 4096
The command will ask for a password: it will protect your SSH key, and you will need it everytime you will use it to connect to the cluster (depending on your configuration, a program named ssh-agent can remember this password after you entered it the first time you connect).
The ssh-keygen program will create two files in your home directory:
id_rsa is your private key: keep this file secret.
id_rsa.pub is your public key. You need to open this file and copy-paste its content to http://my.genouest.org (“Public SSH key” form on the right side, once your are logged in).
Add your key in your ssh agent:
You should then be able to connect to the cluster with this command:
Once logged, you have access to three volumes, available on all computing nodes.
- your home directory (/home/genouest/<your-group>/<your-login>). We have a total of around 100TB of storage capacity shared between all the home directories, and each user have a quota of 100GB. You can check your disk usage with the command “quota -s”. A snapshot mechanism is available on this volumes, if you erased a file by mistake, your can rescue it by looking into the ~/.snapshots directory.
- a project directory (/groups/<your-group>) that you share with your team. We have a total of around 200TB of storage capacity shared between all these group directories. Each project have a specific quota, and a single person in your team is responsible to grant you access to this volume. You can check your disk usage with the command “df -h /groups/<your-group>”.
- a high performance storage space (/omaha-beach/<your-login>). We have a total of around 80TB for /omaha-beach, and each user have a quota of 120GB. You can check your disk usage with the command “pan_quota /omaha-beach/”.
- another high performance storage space (/scratch/<your-login>). Each user have a quota of 250GB. You can check your disk usage with the command “du -sh /scratch/<your-login>“. The scratch storage will ultimately replace omaha-beach.
Quotas are intentionally restrictive, if you need them to be increased, please contact email@example.com.
As a general rule, user should not write during the jobs in the /home or /groups directory, nor do heavy read operations on these volume. They are used to keep your data safe. During jobs, one should use the /omaha-beach directory. This directory is hosted by a high performance system and designed for temporary data. It supports heavy read and write operations.
Please note that none of your data is backed up. If you would like us to backup your data for specific reasons, you can contact us and we will help you to find a solution.
We strongly advise you to anticipate your storage needs: if you plan to generate a big amount of data, please contact us before to check that we have the possibility to host this data. It is preferable to anticipate this when applying for grants that imply data generation and analysis.
Before generating data on the cluster, please do not forget to check the remaining available space. To do so, you may use the quota commands above, or use the df command for global disk usage:
If you erase some files by mistake, you can recover the files by looking in the .snapshots directory. The snapshots are taken each hour and are kept for 5 weeks.
To access the snapshot files of your account just go to the .snapshots directory.
There, you will see all the directories in which the files are stored. The directories are easily recognizable by their name: hourly, daily, weekly.
The snapshot mechanism is only available on /home and /groups directory.
Please note that snapshots are *not* backups. They provide protection against user error, but not against mechanical failure.
Please consider an external backup solution if your data is valuable.
Pre-installed software are available in /softs/local (see software manager for a list of installed software). To use a software, you have to load its environment. For example to load python 2.7 you can launch this command (the dot at the beginning is important):
This will automatically configure the PATH, libraries etc… in your shell environment. Any subsequent python command you will launch will use this 2.7 version.
To get a list of all environments available, just list the content of /softs/local/env/env*.
Note: DO NOT USE the python/perl/…. of the node directly, always load a specific version from /softs/local.
Conda is also available to install software on the cluster.
Conda allows you to install the software you need in your own storage volumes (/home, /groups or /omaha-beach). The software needs to be available as Conda packages.
By default, the channels defaults, bioconda, conda-forge and r are available on the cluster. The Bioconda channel in particular is tailored for bioinformatics tools. You may add channel you need. Please keep in mind that private channels might present security risks (software will not be vetted). If possible, please keep to the standard channels.
To use Conda, first source it the usual way (on a node):
With Conda, you can create as many environments as you want, each one containing a list of packages you need. You need to activate an environment to have access to software installed in it. You can activate only one environment at a time.
To create a new environment containing biopython, deeptools (v2.3.4), bowtie and blast, run:
conda create -p ~/my_env biopython deeptools=2.3.4 bowtie blast
To activate it:
conda activate ~/my_env
To deactivate it:
Feel free to test this new way to install software, and to give us feedback whether you are happy or not of it.
Several versions of Python are available on the cluster. Each one comes with a specific set of modules preinstalled.
If you need to install a module, or to have a different module version, you can use Virtualenv.
Virtualenvs are a way to create a custom Python environment, completely isolated from the Python installation on the cluster. You can install all the Python modules you want in this isolated environment.
To use it, first create a new virtualenv like this:
. /local/env/envpython-3.6.3.sh virtualenv ~/my_new_env
This will create the directory ~/my_new_env. This directory will contain a minimal copy of Python 3.6.3 (the one you sourced just before), without any module installed in it, and completely isolated from the global 3.6.3 python version installed by GenOuest. If you prefer to use a Python 2.7 version, you can source 2.7.15 of Python if you prefer:
. /local/env/envpython-2.7.15.sh virtualenv ~/my_new_env
To use this virtualenv, you need to activate it:
Once activate, your prompt will show that you activated a virtualenv:
You can then install all the Python modules you need in this virtualenv:
pip install biopython pip install pyaml...
Now when you run python, you will be using the virtualenv’s Python version, containing only the modules you installed in it.
Once you have finished working with the virtualenv, you can stop using it and switch back to the normal environment like this:
You can create as many virtualenv as you want, each one being a directory that you can safely remove when you don’t need it anymore.
Launching jobs on the cluster using Slurm
It is forbidden to execute computations directly on the frontals (genossh.genouest.org). You MUST first connect to a node (using srun) or submit a job to a node (using sbatch).
When you submit a job, it is dispatched on one of the computing nodes of the cluster.
Those nodes have different characteristics (cpu, ram). We have servers from 144G up to 1024G RAM on the nodes, with 12 to 72 cores each. Launch the following command to display the list of available nodes and their characteristics and load (memory in MB):
sinfo -N -O nodelist,partition,freemem,cpusstate,memory
The CPUS(A/I/O/T) columns represents the following number of cores on each node. There are 4 values: A=Allocated (ie already reserved), I=Idle (ie available cores), O=Other, T=Total (ie total number of cores).
The memory columns are expressed in Kilobytes.
You can submit a job with the “sbatch” command:
You can launch a shell on a computing node (equivalent to qrsh on SGE) using:
srun --pty bash
If your job is stuck with the message “srun: job xxxx queued and waiting for resources” and nothing happens, it means there are no more ressources available on the cluster. In this case, you can try to use the “tiny” partition where you can launch very short jobs with limited resources:
srun -p tiny --pty bash
Your job will get limited resources with this partition: at most 2Gb and 2 cpus, and a time limit of 2h. But these tiny jobs will have a higher priority. There is a limit of 2 simultaneous jobs per user on this partition. These limits are set to make sure anyone can have a slot to connect to a node for very short works at any time. Please don’t abuse.
As with SGE, you can add submission options in the header of the script using
#!/bin/bash #SBATCH --job-name=test #SBATCH --chdir=workingdirectory #SBATCH --output=res.txt #SBATCH --ntasks=1 #SBATCH --time=10:00 #SBATCH --mem-per-cpu=100
You can submit jobs to a specific partition (the equivalent of SGE’s queues):
sbatch -p genouest my_script.sh
By default, jobs are submitted to the main partition (“genouest”). You only need to use this option for very specific cases.
You can monitor your jobs with the “squeue” command (lists all the jobs by default, restrict to a specific user with the -u option):
squeue squeue -u username
Unlike SGE by default each job will be limited to 1 CPU and 6GB memory. If you need more ressources, you need to add the following options to srun or sbatch commands (or using SBATCH directives):
sbatch --cpus-per-task=8 --mem=50G job_script.sh
In this example, we request 8 CPU and 50G memory on a node to execute the bash script “job_script.sh”. Many options are available to finely tune the amount of cpus and memory reserved for your job, have a look at the srun manual. If at least 1 CPU and 6GB are not available on one node, you may have to wait to be placed. You can use the same options when using srun.
Unlike SGE, these limits are strict, your job will not be allowed to use more than was requested. If you use more than selected RAM, your process will be killed.
To kill a job, simply execute:
scancel XXX (XXX is the job identifier)
To get more information on resource usage, you can:
- display more information for running jobs using squeue:
squeue -l -o "%.18i %.9P %.100j %.8u %.2t %.10M %.6D %C %m %R"
- get information on a specific job:
scontrol show job <job_id>
- check the maximum memory used by a running job:
sstat -j <job_id>.batch --format="JobID,MaxRSS"
- check the cpu time and maximum memory used by a finished job:
sacct -j <job_id> --format="JobID,JobName,User,CPUTime,MaxRSS,ReqMem,ExitCode,State,Elapsed,Partition,NodeList,WorkDir%70"
Many other options to squeue, scontrol, sacct, and sstat are available, you can consult their manual by running them with the –help option.
The information on this page is the most up-to-date and tested way to work with slurm, you can also check this guide for converting more exotic SGE commands to Slurm commands.
You can find a quick tutorial on Slurm on this web site.
If you need to use the DRMAA library (to launch jobs from python code for example), you’ll need to define these environment variables:
export LD_LIBRARY_PATH=/data1/slurm/drmaa/lib/:$LD_LIBRARY_PATH export DRMAA_LIBRARY_PATH=/data1/slurm/drmaa/lib/libdrmaa.so
Some tools require to run on recent processors supporting specific instruction sets like AVX or AVX2. A few old compute nodes don’t support these instructions. To make sure that your job will be run on a recent node supporting these instructions, you can add the –constraint option to srun or sbatch:
sbatch --constraint avx2 my_script.sh
Interactive jobs (srun) and disconnections:
If you want to create an interactive job (srun –pty bash), run a long-running command, but you need to disconnect before it is finished, unfortunately the job will be killed, and your command stopped.
There is a solution to avoid that: use tmux, which is a terminal multiplexer (just as screen).
First connect to genossh as usual, then start a tmux session:
Then you can connect to a node using srun, and launch the commands you like. When you need to disconnect, you need to detach from your tmux session by typing “Ctrl+B” on your keyboard, then the letter “d”. You can then safely disconnect from genossh (and the internet).
Later, when you want to reconnect to your interactive job, just connect to genossh, and attach to your tmux session you created before by running:
You will be be able to continue your work just as if you never disconnected from the cluster.
Tmux allows allows to manage many multiple parallel sessions like this, look at the documentation for more advanced usage.
It is possible to submit many similar jobs at once using job arrays. See the job arrays documentation for more details. Briefly, if you launch this command:
sbatch --array=1-50%5 my_script.sh
an array of 50 jobs will be created for the script “my_script.sh”, with a maximum of 5 jobs running simultaneously. In my_script.sh, you have access to the SLURM_ARRAY_TASK_ID environment variable which corresponds to the index of the task between 1 and 50.
Using Nvidia GPUs on the cluster using Slurm
A compute node with 4 Nvidia GPU is available on the Slurm cluster. To use it, you will need to use sbatch or srun commands as for a normal Slurm job, but with 2 specific options:
srun --gres=gpu:1 -p gpu --pty bash
The -p option allows to select one of the nodes equipped with GPU. The –gres option determines the number of GPUs which will be reserved for you by Slurm. Slurm automatically populates an environment variable (CUDA_VISIBLE_DEVICES) with the id of the GPU that you can use. This environment variable will be used by CUDA applications to use the reserved GPU(s).
If you want to run a software that requires access to an X11 server, you can enable X forwarding by following these steps:
First, connect to the cluster with the -XC options (X is to enable X forwarding, C is to enable compression):
ssh -XC <your-login>@genossh.genouest.org
You need first to setup a specific ssh key (you only need to do it once, the first time you try to use X11 forwarding). Do it like this:
ssh-keygen -f ~/.ssh/id_slurm -t rsa -b 4096 cat ~/.ssh/id_slurm >> ~/.ssh/authorized_keys
You must not protect this ssh key with a a password (just type enter when it is asked). This will create 2 files in your home (~/.ssh/id_slurm and ~/.ssh/id_slurm.pub) that you must not share with anyone.
You can then simply run the following commands to start using an X application:
ssh -X <your-login>@genossh.genouest.org srun --x11 --pty bash
Pre-designed WorkFlows with Nextflow
Nextflow is installed on the cluster, it can be used by sourcing the correct environnement:
Nextflow is preconfigured to use Singularity (when workflows are designed to use containers) and to distribute jobs on the Slurm cluster.
Alternatively, we provide some other pre-designed workflows for RNAseq and small-RNAseq analysis. They can be found in the directory
/local/nextflow/ with the following pattern:
For more convenience, a shell wrapper is available for each workflow. You just have to copy it into your working directory, for example:
cp /local/nextflow/rnaseq/nfcore/rnaseq_nfcore.sh ~/workdir/
You may need to customize the script to suit your needs (data files, etc..), and run it with SLURM like this:
A FTP server is available to transfer data to/from your home directory. Use the following information to connect with any FTP client (like FileZilla for example):
– host: ftps://gridftp.genouest.org
– port: 990
– login: [your-genouest-login]
– password: [your-genouest-password]
GO-Docker is a Slurm-like batch scheduling system. It provides a command-line and a web interface to submit shell scripts in Docker containers. The web interface provides an easy way to submit jobs and get an overview of their status.
A REST interface also eases its integration to other tools for automation.
Basically, one select the CPU/memory requirements, and the container image needed for the computation. Some default images are provided, the genouest one is a clone of the current computation nodes based on CentOS.
It also provides interactive sessions to the tasks, in fact a SSH access the container.
Jobs metrics, once job is over can be queried at:
Singularity is a new technology allowing to use containers in a High-Performance Computing environment.
Just as Docker, it allows you to launch applications inside containers, completely isolated from the rest of the system. However, unlike Docker, you don’t have access to the root account inside the container. This makes it possible to use it on a standard cluster like the GenOuest one.
Singularity is installed on the newest computing nodes of the cluster. To use it, you need to source it:
Then you can launch any singularity container, for example:
singularity run library://sylabsed/examples/lolcow
Singularity is compatible with Docker images, you can run one like this:
singularity shell docker://quay.io/biocontainers/bowtie2:184.108.40.206--py35h2d50403_1
If you want to have access to some specific directories from the cluster, you can use the -B option like this:
singularity shell -B /db:/db -B /omaha-beach:/omaha-beach docker://quay.io/biocontainers/bowtie2:220.127.116.11--py35h2d50403_1
See the official website for more information on how to use Singularity.
The complete documentation is available on these pages
Storage in the cloud is accessible at https://genostack-data.genouest.org
This storage is a cold storage facility, you may compare it to a Dropbox storage. This means that you cannot read/write files directly but you need to pull/push your files to access them. However, you can remotely access them via your browser and share them with temporary url with other users. Files can also be annoted with additional meta-data.
This service is hosted in our openstack cloud (openstack swift), and data can be accessed from your cloud virtual machines, our cluster or any external location. Storage is linked to an openstack project. By default, you have a cloud project matching your genouest user identifier, and your identifiers are your genouest account ones.
swift --os-auth-url https://genostack-api-keystone.genouest.org/v3 --auth-version 3 --os-project-name my_genouest_user_identifier --os-project-domain-name Users --os-username my_genouest_user_identifier --os-user-domain-name Users --os-password my_genouest_user_password list
Default quota is 100Gb but can be extended on demand.
You can use Jupyter in multiple ways using the GenOuest resources:
- By launching a VM in the Genostack cloud
- By running it inside a Docker container with GO-Docker
- By running it on the Slurm cluster
Here’s some help to run it on our cluster (inspired by https://alexanderlabwhoi.github.io/post/2019-03-08_jpn-slurm/)
First, connect to the cluster and connect to a compute node:
ssh <login>@genossh.genouest.org srun --pty bash
Then source the preinstalled Jupyter:
Then run a jupyter notebook, with the option
--no-browser as no web browser is installed on our cluster:
jupyter notebook --no-browser --port 8888
Then, open another console on your local machine (laptop), and create an ssh bridge like this:
ssh -A -t -t <login>@genossh.genouest.org -L 8888:localhost:8888 ssh cl1nXXX -L 8888:localhost:8888
Replace “cl1nXXX” by the name of the node where the Jupyter notebook is running.
Then you can use your favorite web browser and connect to http://localhost:8888/
You can replace the 8888 port by another value if you want (it can already be used by someone else, in this case, you’ll get an “Address already in use” error), just take care to replace it everywhere.
If you want to use the brand new JupyterLab instead, do the same, but source jupyterlab instead:
You can create a personal MySQL database from your personal space on my.genouest.org, in the “Databases” block.
Once it is created, you will receive an email with credential to connect to the database. Note that you can only connect from the GenOuest network (genossh.genouest.org or the compute nodes for example).
To test the connection, you can run the command line MySQL client like this:
mysql -u <username> -h <host> -p <database>
Then enter the password you received in the email
For support, questions, information, please send a mail to the GenOuest Team.
For users having a GenOuest account, we also offer an additional way to get in touch with us through instant messaging with Rocket.Chat (https://instant.cesgo.org/channel/support-genouest)
If you want to cite GenOuest in your article :
- We acknowledge the GenOuest bioinformatics core facility (https://www.genouest.org) for providing the computing infrastructure.
You can use this sentence as a template and modify it to better suit to your needs. The key element is to mention the GenOuest bioinformatics facility and the website URL.