Train your model in background and disconnect from PyCharm + Remote Debug/run over SSH with nohup
There are a few requests for a feature in PyCharm to be able to run a python script (say, training a DNN for hours/days) on a remote server (say a DGX server or GPU DevBox) and then close pycharm/laptop and go home and come back and look at your experiment logs/tensorboard.
I am also using PyCharm remote debugging feature through SSH/Docker with delight and love such a solution to be added to our Pycharm arsenal. But until then, I came up with this temporary workaround that works for me and allows me to create a whole script of experiments I want to run on the server overnight/holiday while my laptop is closed and I am logged off the remote server.
Obviously, you can simply write your batch file or run your training script using nohup from a SSH terminal. However, if you are using Pycharm for developing your code, this requires at least dealing with a few things in the command line again:
- Setting environment variables such as LD_LIBRARY_PATH
- Making sure you are using the same Python interpreter as the rest of your project, which is usually a single command depending on what sort of virtual environment you are using.
- Changing the working directory to your script directory.
- Adding content and source paths to python path so your modules are visible to the Python interpreter.
These steps are usually simple but dependent on the project settings and might change from time to time. I found that it would be much nicer if I had a separate script that could inherit all that info from the project, which I could run with a single click to initiate offline experiments when I am planning to log off my laptop and go home.
So the simple solution is to create a short Python script in the same directory as your training script (say model.py). I call it offline_training.py and it looks like this:
This simple script allows you to setup your experiments that you want to run sequentially on the server when you are away from PyCharm with your SSH connection closed. It automatically inherits all the above settings and variables from your project and uses them to run your experiments and record the outputs to res.txt files.
When you come back, you can look at your res.txt/log files or open your tensorboard, etc and see what has been happening while you were away. Of course, your can also use PyCharm External Tools and Remote External Tools features to setup one-click solutions (such as firing up tensorboard on your remote server and opening it in your favorite browser) by inheriting all the info from your project as well. If you have a more sophisticated solution or an extension please ping me. I would love to update this and include your solution until PyCharm implements a better UX.
Update (2 May 2023): JetBrains recently introduced Remote Development in PyCharm Professional as a beta that allows leaving your script to run on your server while you disconnect that seems to be a good solution to the above need.