Jupyter notebook is widely used by data scientists and machine learning experts
in their day to day work to interactively and iteratively develop. However, the
ipynb
format is typically not used as a deployable or packageable artifact.
There are two scenarios that notebooks are converted to deployable/package
artifacts:
- Model training tasks needed to convert to batch jobs to scale up with more computational resources
- Model inference tasks needed to convert to an API server to serve the end-user requests
In this guide we will showcase two different tools which may help facilitate converting your notebook to a deployable/packageable raw python library.
This process can also be automated utilizing Continuous Integration (CI) tools such as Cloud Build.
-
Update the notebook to
Pair Notebook with Percent Format
Jupytext comes with recent jupyter notebook or jupyter-lab. In addition to just converting from
ipynb
to python, it can pair between the formats. This allows for updates made inipynb
to be propagated to python and vice versa.To pair the notebook, simply use the pair function in the File menu:
In this example we use the file gpt-j-online.ipynb:
-
After pairing, we get the generated raw python:
NOTE: This conversion can also be performed via the
jupytext
cli with the following command:jupytext --set-formats ipynb,py:percent \ --to py gpt-j-online.ipynb
-
Extract the module dependencies
In the notebook environment, users typically install required python modules using
pip install
commands, but in the container environment, these dependencies need to be installed into the container prior to executing the python library.We can use the
pipreqs
tool to generate the dependencies. Add the following snippet in a new cell of your notebook and run it:!pip install pipreqs !pipreqs --scan-notebooks
The following is an example output:
NOTE: (the
!cat requirements.txt
line is an example of the generatedrequirements.txt
) -
Create the Dockerfile
To create the docker image of your generated raw python, we need to create a
Dockerfile
, below is an example. Replace_THE_GENERATED_PYTHON_FILE_
with your generated python file:FROM nvidia/cuda:12.2.0-runtime-ubuntu22.04 RUN apt-get update && \ apt-get -y --no-install-recommends install python3-dev gcc python3-pip git && \ rm -rf /var/lib/apt/lists/* COPY requirements.txt _THE_GENERATED_PYTHON_FILE_ /_THE_GENERATED_PYTHON_FILE_ RUN pip3 install --no-cache-dir -r requirements.txt ENV PYTHONUNBUFFERED 1 CMD python3 /_THE_GENERATED_PYTHON_FILE_
-
[Optional] Lint and remove unused code
Using
pylint
to validate the generated code is a good practice. Pylint can detect unorderedimport
statements, unused code and provide code readability suggestions.To use
pylint
, create a new cell in your notebook, run the code below and replace_THE_GENERATED_PYTHON_FILE_
to your filename:!pip install pylint !pylint _THE_GENERATED_PYTHON_FILE_
We can convert a Jupyter notebook to python script using nbconvert tool.
The nbconvert tool is available inside your Jupyter notebook environment in
Google Colab Enterprise. If you are in another environment and it is not
available, it can be found here