GODocker

Cluster computing system with Docker

What is GO-Docker?

It is a batch computing/cluster management tool using Docker as execution/isolation system. It can be seen like Sun Grid Engine/Torque/... The software does not manage however itself the dispatch of the commands on the remote nodes. For this, it integrates with container management tools (Docker Swarm, Apache Mesos, ...) It acts as an additional layer above those tools on multiple user systems where users do not have Docker priviledges or knowledge.

Access to Wiki, tutorials, ... - Created by: Olivier Sallou at IRISA


2015 IEEE International Conference on Cluster Computing: DOI: 10.13140/RG.2.1.1957.6801, proceedings

Designed for customization

A plugin management system gives the possibility to add new authentication or authorization management, to create a different scheduler to order the submitted jobs based on user consumption/quotas....

It is also possible to develop new execution system integration.

Time saver

Execute and watch your jobs via the web interface from home, work, ...

CLI interface also allows remote access to the system.

Isolation

Docker provides job isolation. Main process but also all child processes are isolated in their container. With Mesos, one can request one or many GPU to be available in the container.

Statistics

Live statistics on container usage and general statistics are available for users and administrators.

Dashboard with interactive charts on current or past tasks

Deploy in seconds

Bare-metal, Vagrant, Docker, Amazon cloud....

Prebuild images or scripts for automatic installation, simply update configuration.

Scalable

from a single server to ... well ... many servers!

Application has been tested with 40 nodes in the cloud (thanks to an Amazon EC2 grant), and web servers can be scaled to face web/API requests as well as executors.

Many executors (check tasks, kill them...) can be ran in parallel, however only one scheduler can be running as it needs knowledge of all pending tasks and users. All components can however run in HA mode (active/standby or all active)

Technologies / Components

  • Docker / Swarm / Apache Mesos / Kubernetes (experimental)
  • Python
  • AngularJS
  • MongoDB
  • Redis
  • NodeJS
  • cAdvisor (optional)
  • InfluxDB (optional)

Ecosystem

  • Logs: Integrate your logs with Logstash, Graylog, etc.
  • External monitoring: Grafana (Influxdb), Prometheus endpoint
  • Workflows: Fireworks, Airflow (see repository for components and examples)
  • Status: etcd, consul

Features

Get Started

CLI

CLI commands


   #godjob create -n job_name -d "example job" -i centos:latest -s script_to_execute.sh -v home --cpu 2 --ram 10
  #godjob create -n job_name -d "interactive job" -i rastasheep/ubuntu-sshd --interactive
                     

Web

REST API

    
        $.getJSON('/api/1.0/task/12', function(data){
            console.log('task status': data.status.primary);
        });
    

Integrate with workflow tools

Fireworks


param = {
    'stdout_file': 'hello.out',
    'cpu': 2
}
task1 = GoDockerTask.from_str('echo "hello"', parameters=param)
                     

Airflow


templated_command = """
    echo "{{ god_user }}"
    """

t1 = GoDockerBashOperator(
    task_id='print_date',
    bash_command=templated_command,
    cpu=2,
    params={'my_param': 'Paramater I passed in'},
    dag=dag)
                     

Software

GO-Docker is open source and is available at:

Wiki, tutorials, ...

GO-Docker scheduler and executor

GO-Docker Web server

GO-Docker Live notifications

GO-Docker CLI

Wanna contribute?

Feel free to fork the code and send pull requests

The plugin system allows to integrate new scheduler algorithms or executors (like Docker Swarm), do not hesitate to develop new ones to extend GO-Docker integration and capabilities.