The ML workspace is an all-in-one web-based IDE specialized for machine learning and data science. It is simple to deploy and gets you started within minutes to productively built ML solutions on your own machines. This workspace is the ultimate tool for developers preloaded with a variety of popular data science libraries (e.g., Tensorflow, PyTorch, Keras, Sklearn) and dev tools (e.g., Jupyter, VS Code, Tensorboard) perfectly configured, optimized, and integrated.
The workspace is equipped with a selection of best-in-class open-source development tools to help with the machine learning workflow. Many of these tools can be started from the Open Tool
menu from Jupyter (the main application of the workspace):
Within your workspace you have full root & sudo privileges to install any library or tool you need via terminal (e.g., pip
, apt-get
, conda
, or npm
). You can find more ways to extend the workspace within the Extensibility section.
!pip install matplotlib-venn
Jupyter Notebook is a web-based interactive environment for writing and running code. The main building blocks of Jupyter are the file-browser, the notebook editor, and kernels. The file-browser provides an interactive file manager for all notebooks, files, and folders in the /workspace
directory.
A new notebook can be created by clicking on the New
drop-down button at the top of the list and selecting the desired language kernel.
New -> Terminal
in the file-browser.
The notebook editor enables users to author documents that include live code, markdown text, shell commands, LaTeX equations, interactive widgets, plots, and images. These notebook documents provide a complete and self-contained record of a computation that can be converted to various formats and shared with others.
The Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook and returns output. This workspace has a Python 3 and Python 2 kernel pre-installed. Additional Kernels can be installed to get access to other languages (e.g., R, Scala, Go) or additional computing resources (e.g., GPUs, CPUs, Memory).
This workspace provides an HTTP-based VNC access to the workspace via noVNC. Thereby, you can access and work within the workspace with a fully-featured desktop GUI. To access this desktop GUI, go to Open Tool
, select VNC
, and click the Connect
button. In the case you are asked for a password, use vncpassword
.
Once you are connected, you will see a desktop GUI that allows you to install and use full-fledged web-browsers or any other tool that is available for Ubuntu. Within the Tools
folder on the desktop, you will find a collection of install scripts that makes it straightforward to install some of the most commonly used development tools, such as Atom, PyCharm, R-Runtime, R-Studio, or Postman (just double-click on the script).
Clipboard: If you want to share the clipboard between your machine and the workspace, you can use the copy-paste functionality as described below:
Visual Studio Code (Open Tool -> VS Code
) is an open-source lightweight but powerful code editor with built-in support for a variety of languages and a rich ecosystem of extensions. It combines the simplicity of a source code editor with powerful developer tooling, like IntelliSense code completion and debugging. The workspace integrates VS Code as a web-based application accessible through the browser-based on the awesome code-server project. It allows you to customize every feature to your liking and install any number of third-party extensions.
The workspace also provides a VS Code integration into Jupyter allowing you to open a VS Code instance for any selected folder, as shown below:
JupyterLab (Open Tool -> JupyterLab
) is the next-generation user interface for Project Jupyter. It offers all the familiar building blocks of the classic Jupyter Notebook (notebook, terminal, text editor, file browser, rich outputs, etc.) in a flexible and powerful user interface. This JupyterLab instance comes pre-installed with a few helpful extensions such as a the jupyterlab-toc, jupyterlab-git, and juptyterlab-tensorboard.
Version control is a crucial aspect of productive collaboration. To make this process as smooth as possible, we have integrated a custom-made Jupyter extension specialized on pushing single notebooks, a full-fledged web-based Git client (ungit), a tool to open and edit plain text documents (e.g., .py
, .md
) as notebooks (jupytext), as well as a notebook merging tool (nbdime). Additionally, JupyterLab and VS Code also provide GUI-based Git clients.
For cloning repositories via https
, we recommend to navigate to the desired root folder and to click on the git
button as shown below:
This might ask for some required settings and, subsequently, opens ungit, a web-based Git client with a clean and intuitive UI that makes it convenient to sync your code artifacts. Within ungit, you can clone any repository. If authentication is required, you will get asked for your credentials.
To commit and push a single notebook to a remote Git repository, we recommend to use the Git plugin integrated into Jupyter, as shown below:
For more advanced Git operations, we recommend to use ungit. With ungit, you can do most of the common git actions such as push, pull, merge, branch, tag, checkout, and many more.
Jupyter notebooks are great, but they often are huge files, with a very specific JSON file format. To enable seamless diffing and merging via Git this workspace is pre-installed with nbdime. Nbdime understands the structure of notebook documents and, therefore, automatically makes intelligent decisions when diffing and merging notebooks. In the case you have merge conflicts, nbdime will make sure that the notebook is still readable by Jupyter, as shown below:
Furthermore, the workspace comes pre-installed with jupytext, a Jupyter plugin that reads and writes notebooks as plain text files. This allows you to open, edit, and run scripts or markdown files (e.g., .py
, .md
) as notebooks within Jupyter. In the following screenshot, we have opened a markdown file via Jupyter:
In combination with Git, jupytext enables a clear diff history and easy merging of version conflicts. With both of those tools, collaborating on Jupyter notebooks with Git becomes straightforward.
The workspace has a feature to share any file or folder with anyone via a token-protected link. To share data via a link, select any file or folder from the Jupyter directory tree and click on the share button as shown in the following screenshot:
This will generate a unique link protected via a token that gives anyone with the link access to view and download the selected data via the Filebrowser UI:
To deactivate or manage (e.g., provide edit permissions) shared links, open the Filebrowser via Open Tool -> Filebrowser
and select Settings->User Management
.
It is possible to securely access any workspace internal port by selecting Open Tool -> Access Port
. With this feature, you are able to access a REST API or web application running inside the workspace directly with your browser. The feature enables developers to build, run, test, and debug REST APIs or web applications directly from the workspace.
If you want to use an HTTP client or share access to a given port, you can select the Get shareable link
option. This generates a token-secured link that anyone with access to the link can use to access the specified port.
/tools/PORT/
).
SSH provides a powerful set of features that enables you to be more productive with your development tasks. You can easily set up a secure and passwordless SSH connection to a workspace by selecting Open Tool -> SSH
. This will generate a secure setup command that can be run on any Linux or Mac machine to configure a passwordless & secure SSH connection to the workspace. Alternatively, you can also download the setup script and run it (instead of using the command).
Just run the setup command or script on the machine from where you want to setup a connection to the workspace and input a name for the connection (e.g., my-workspace
). You might also get asked for some additional input during the process, e.g. to install a remote kernel if remote_ikernel
is installed. Once the passwordless SSH connection is successfully setup and tested, you can securely connect to the workspace by simply executing ssh my-workspace
.
Besides the ability to execute commands on a remote machine, SSH also provides a variety of other features that can improve your development workflow as described in the following sections.
-R
option (instead of -L
).
The workspace can be integrated and used as a remote runtime (also known as remote kernel/machine/interpreter) for a variety of popular development tools and IDEs, such as Jupyter, VS Code, PyCharm, Colab, or Atom Hydrogen. Thereby, you can connect your favorite development tool running on your local machine to a remote machine for code execution. This enables a local-quality development experience with remote-hosted compute resources.
These integrations usually require a passwordless SSH connection from the local machine to the workspace. To set up an SSH connection, please follow the steps explained in the SSH Access section.
Tensorboard provides a suite of visualization tools to make it easier to understand, debug, and optimize your experiment runs. It includes logging features for scalar, histogram, model structure, embeddings, and text & image visualization. The workspace comes pre-installed with jupyter_tensorboard extension that integrates Tensorboard into the Jupyter interface with functionalities to start, manage, and stop instances. You can open a new instance for a valid logs directory, as shown below:
If you have opened a Tensorboard instance in a valid log directory, you will see the visualizations of your logged data:
If you prefer to see the tensorboard directly within your notebook, you can make use of following Jupyter magic:
%load_ext tensorboard.notebook
%tensorboard --logdir /workspace/path/to/logs
The workspace provides two pre-installed web-based tools to help developers during model training and other experimentation tasks to get insights into everything happening on the system and figure out performance bottlenecks.
Netdata (Open Tool -> Netdata
) is a real-time hardware and performance monitoring dashboard that visualize the processes and services on your Linux systems. It monitors metrics about CPU, GPU, memory, disks, networks, processes, and more.
Glances (Open Tool -> Glances
) is a web-based hardware monitoring dashboard as well and can be used as an alternative to Netdata.
The workspace image can also be used to execute arbitrary Python code without starting any of the pre-installed tools. This provides a seamless way to productize your ML projects since the code that has been developed interactively within the workspace will have the same environment and configuration when run as a job via the same workspace image.
The workspace is pre-installed with many popular interpreters, data science libraries, and ubuntu packages:
The full list of installed tools can be found within the Dockerfile.
The workspace provides a high degree of extensibility. Within the workspace, you have full root & sudo privileges to install any library or tool you need via terminal (e.g., pip
, apt-get
, conda
, or npm
). You can open a terminal by one of the following ways:
New -> Terminal
Applications -> Terminal Emulator
File -> New -> Terminal
Terminal -> New Terminal
Additionally, pre-installed tools such as Jupyter, JupyterLab, and Visual Studio Code each provide their own rich ecosystem of extensions. The workspace also contains a collection of installer scripts for many commonly used development tools or libraries (e.g., PyCharm
, Zeppelin
, RStudio
, Starspace
). Those scripts can be either executed from the Desktop VNC (double-click on the script within the Tools
folder on the Desktop) or from a terminal (execute any tool script from the /resources/tools/
folder).
As an alternative to extending the workspace at runtime, you can also customize the workspace Docker image to create your own flavor as explained in the FAQ section.
The ML Workspace project is maintained by Lukas Masuch and Benjamin Räthlein. Please understand that we won't be able to provide individual support via email. We also believe that help is much more valuable if it's shared publicly so that more people can benefit from it.
Type | Channel |
---|---|
🚨 Bug Reports | |
🎁 Feature Requests | |
👩💻 Usage Questions | |
🗯 General Discussion |