We use cookies and similar technologies to improve user experience and analyze website traffic. For these
reasons, we may share your site usage data with our analytics partners. If you do not wish this,
click here
. For more information read our
privacy policy
This guide describes how to set up and use the KNIME Python Integration
in KNIME Analytics Platform with its two nodes: Python Script node and Python View node.
In the
v4.5 release
of KNIME Analytics
Platform, we introduced the Python Script (Labs) node, which is since the
v4.7 release
the current Python Script node of this guide.
The KNIME Python Integration works with Python versions 3.9 to 3.11 and comes
with a bundled Python environment to let you start right away. This convenience allows to use the nodes
without installing, configuring or even knowing environments. The included bundled Python environment comes with
these packages
.
The section
Using the Python nodes
explains how the configuration of the dialogs can
be used, as well as how to work with data coming to and going out of the nodes, how to work with
batches and how to use the Python Script node with scripts of older Python nodes. It also provides
the use-case of using
Jupyter notebooks
and references further examples.
Before the v4.7 release, this extension was in labs and the
KNIME Python Integration (legacy)
was the current Python Integration. For
anything related to the legacy nodes of the former KNIME Python Integration, please refer to
the
Python Integration guide of KNIME Analytics Platform v4.6
.
The advantages of the current Python Script node and the Python View node compared to legacy nodes
are significantly improved performance and data transfer between Python processes and the KNIME Analytics Platform thanks to
Apache Arrow
, a bundled environment to start right away, a unified API via the knime.scripting.io module, conversion support to and from both
Pandas DataFrames
and
PyArrow Tables
, support for arbitrarily large data sets by using
batches
.
If you look for Python 2 support, you will also need to use the KNIME Python Integration (legacy).
To achieve biggest possible performance gains, we recommend configuring your workflows to use
Columnar Backend
.
Right-click a workflow in KNIME Explorer, select
Configure…
, then choose the
Columnar Backend
option under
Selected Table Backend
. Additional information about table backends can be found
here
.
This chapter guides through the configuration of the script dialog and the amount of ports,
followed by examples of usage. These examples cover the access of input data, followed by
table conversion and the usage of batches for data larger than RAM. Then it will explain how to port scripts from
Python legacy nodes to this extension. After that, the additional features of the Python View node are explained.
The chapter concludes with the use-case of loading and accessing Jupyter notebooks.
Script Editor
Your primary area for code development is the Script Editor. It comes with the
convenience of auto-completion to expedite your coding process. Additionally,
hovering over functions or methods reveals tooltips, providing usage guidance.
Inputs/Outputs
(Left Panel)
Displayed here are the input and output variables accessible to your node. You
can easily incorporate these into your script by dragging them from the panel
into the Script Editor.
Ask K-AI
Tap into AI for code assistance. Input a prompt in the "Ask K-AI" box, and our
AI model will suggest code relevant to your prompt. Inspect the generated code
and, if it meets your requirements, integrate it into your script.
The "Run all" button allows for the execution of your entire script in a new
Python process, which remains accessible post-execution. To run a specific
segment of your code, select the desired lines and click "Run selected lines,"
executing them in the active Python process.
Temporary Values
Post-execution, this panel lists the local variables defined in your script.
It’s not just for show; you can interact with these variables by clicking on
them, prompting their values to be printed in the console. This interactive
feature is particularly useful for quick variable inspections and debugging.
Console
The console displays the real-time standard output from your Python session,
including print statements and other script outputs. To start afresh or
declutter the console, use the trash icon button situated at the top right.
Execution Status
This section provides feedback on the script’s execution process. It indicates
the status of the last script run, allowing you to confirm that the script has
executed as intended or to identify if there are any actions needed to address
script issues.
Output Preview
The Output Preview panel is only visible in the dialog of the "Python View"
node and shows the output view after script execution. This interactive preview
is updated on the fly whenever the output view is update by the interactive
Python session.
The "Ask K-AI" feature within the KNIME Python Scripting Node is an advanced
AI-assisted code generation tool. When activated, you can input prompts
specifying the intended functionality of the code. The AI assistant has
contextual awareness of the KNIME Python API, the input data’s structure, and
the current script content in the editor.
Once the assistant generates the code, it is presented to you in a diff-editor
format, which highlights the differences between your current code and the new
suggestion. You then have the option to review these suggestions and choose
whether to accept them into your script or discard them, providing a high
degree of control over the changes made to your code.
Upon utilizing this service, be aware that the current code from the
editor, the input data’s schema, and the prompt are sent over the internet to
the configured KNIME Hub and OpenAI, which is a consideration for data privacy.
This transmission is necessary for the AI to tailor code suggestions accurately
to your script’s context and the data you are working with.
When you create a new instance of the Python Script nodes, the code editor will already contain
starter code, in which we
import knime.scripting.io as knio
. The content shown in the input, output, and flow variable
panes can be accessed via this
knime.scripting.io
module.
If the package
knime
is installed via
pip
in the environment used for the Python script node,
accessing the
knime.scripting.io
module will fail with the error
No module named 'knime.scripting'; 'knime' is not a package
.
In that case, run
pip uninstall knime
in your Python environment.
knio.output_images[i]
to output images, which must be either a string describing an SVG image or a byte array encoding a PNG image,
where
i
is the index of the corresponding table/object/image (
0
for the first input/output port,
1
for the second input/output port, and so on).
The
knime.scripting.io
module provides a simple way of accessing the input data as a
Pandas DataFrame
or
PyArrow Table
.
This can prove quite useful since the two data representations and corresponding libraries provide
a different set of tools that might be applicable to different use-cases.
First, you need to initialise an instance of a table to which the batches will be written after being processed:
processed_table = knio.BatchOutputTable.create()
Calling the
batches()
method on an input table returns an iterable, items of which are batches of the input table that can be accessed via a
for
loop:
processed_table = knio.BatchOutputTable.create()
for batch in knio.input_tables[0].batches():
Inside the
for
loop, the batch can be converted to a Pandas DataFrame or a PyArrow Table using the methods
to_pandas()
and
to_pyarrow()
mentioned above:
processed_table = knio.BatchOutputTable.create()
for batch in knio.input_tables[0].batches():
input_batch = batch.to_pandas()
At the end of each iteration of the loop, the batch should be appended to the
processed_table
:
processed_table = knio.BatchOutputTable.create()
for batch in knio.input_tables[0].batches():
input_batch = batch.to_pandas()
# process the batch
processed_table.append(input_batch)
input_table_1 = knio.input_tables[0].to_pandas()
# the script from the legacy nodes goes here
knio.output_tables[0] = knio.Table.from_pandas(output_table_1)
Note that the numbering of inputs and outputs in the Python nodes is 0-based - keep that in mind when porting your scripts from the other Python nodes, which have a 1-based numbering scheme
(e.g.
knio.input_tables[0]
in the Python nodes corresponds to
input_table_1
in the legacy Python nodes).
The Python View node can be used to create views using Python scripts.
It has the same configurable input ports as the Python Script node and uses the same API to access the input data.
However, the Python View node has no output ports except for one optional image output port.
To create a view the script must populate the variable
knio.output_view
with a return value of one of the
knio.view*
functions.
It is possible to create views from all kinds of displayable objects via the convenience method
knio.view
, which tries to detect the correct format and calls the matching method of the
following list of
knio.view*
functions (see
API
for more details):
knio.view_plotly
creates a view from a plotly figure; note that to be able to synchronize the selection between the view and other KNIME views, the custom_data of the figure traces must be set to the RowID
Example:
fig = px.scatter(df, x="my_x_col", y="my_y_col", color="my_label_col",
custom_data=[df.index]) # custom_data is set to the RowID
node_view = view_plotly(fig)
The output image port is populated automatically if the view is an SVG, PNG, or JPEG image or can be converted to one. Matplotlib and seaborn figures will be converted to a PNG or SVG image depending on the format chosen in
view_matplotlib`
. Plotly figures can only be converted to images if the package kaleido is installed in the environment. Objects that have an IPython
repr_svg
,
repr_png
, or
repr_jpeg
function will be converted by calling the first of these functions available. HTML documents cannot be converted to images automatically. However, it is possible to set an image representation or a function that returns an image representation when calling
view_html
(see the
API
).
Otherwise, the script must populate the variable
knio.output_images[0]
like in the Python Script node.
Load Jupyter notebooks from KNIME
Existing Jupyter notebooks can be accessed within Python Scripting nodes if we
import knime.scripting.jupyter as knupyter
.
Notebooks can be opened via the function
knupyter.load_notebook
, which returns a standard Python module.
The
load_notebook
function needs the path to the folder that contains the notebook file and the filename of the notebook as arguments.
After a notebook has been loaded, you can call functions that are defined in the code cells of the notebook like any other function of a Python module.
Furthermore, you can print the textual content of each cell of a Jupyter notebook using the function
knupyter.print_notebook
. It takes the same arguments as the
load_notebook
function.
An example script for a Python Script node loading a notebook could look like this:
# Path to the folder containing the notebook, e.g. the folder 'data' contained
# in my workflow folder
notebook_directory = "knime://knime.workflow/data/"
# Filename of the notebook
notebook_name = "sum_table.ipynb"
# Load the notebook as a Python module
import knime.scripting.jupyter as knupyter
my_notebook = knupyter.load_notebook(notebook_directory, notebook_name)
# Print its textual contents
knupyter.print_notebook(notebook_directory, notebook_name)
# Call a function 'sum_each_row' defined in the notebook
output_table = my_notebook.sum_each_row(input_table)
The
load_notebook
and
print_notebook
functions have two optional arguments:
notebook_version
: The Jupyter notebook format major version. Sometimes the version cannot be read from a notebook file.
In these cases, this option allows to specify the expected version in order to avoid compatibility issue and should be an integer.
only_include_tag
: Only load cells that are annotated with the given custom cell tag (since Jupyter 5.0.0).
This is useful to mark cells that are intended to be used in a Python module. All other cells are excluded.
This is e.g. helpful to exclude cells that do visualization or contain demo code and should be a string.
The KNIME Python Integration requires a configured Python environment. In this section we describe
how to install the Python integration and how to configure its Python environment.
Besides the prerequisites, we explain possibilities for two different scopes: for the whole KNIME
Analytics Platform and node-specific. The latter is handy when sharing your workflow. Lastly,
the configuration for the KNIME Executor (which is used in the KNIME Business Hub) is explained
in configuration example.
Prerequisites
Install the Python extension. Drag and drop the
extension from the KNIME Hub
into the workbench to install it.
Or got to
File → Install KNIME Extensions
in KNIME Analytics Platform and install the
KNIME Python Integration
in the category
KNIME & Extensions
.
Install Conda, a package and environment manager. For instance,
Miniconda
, which is a minimal installation of
Conda. Its initial environment,
base
, will contain a Python installation, but we recommend to
create new environments for your specific use-cases.
In the KNIME Analytics Platform Preferences, configure the
Path to the Conda installation directory
under
KNIME > Conda
, as shown in the following figure.
You will need to provide the path to the folder containing your installation of Conda. For
Miniconda, the default installation path is
The KNIME Python Integration is installed with a bundled Python environment, consisting of a specific set of Python packages (i.e. Python libraries) to start right away: just open the Python Script node and start scripting.
As not everybody needs everything, this set is quite limited to allow for many scripting scenarios while keeping the bundled environment small. Thus, the list
of included packages can be found
in the contents of this metapackage
and in the following list (with some additional dependencies):
The bundled environment is selected by default and can be reselected here:
If you want a Python environment with more than the packages provided by the bundled environment,
you can create your environment using our metapackages.
Two metapackages are important:
knime-python-base
contains the basic packages which are always needed.
knime-python-scripting
contains
knime-python-base
and installs additionally the packages used in the Python Script
node. This is the set of packages which is also used in the bundled environment. Find the lists
here
.
You can choose between different Python version (currently 3.9 to 3.11) and select the current
KNIME Analytics Platform version. See the
KNIME conda channel
for available
versions.
Create a new environment in a terminal by adjusting and entering
Further information on how to manage Conda packages can be found
here
.
Do
not
install the package
knime
using
pip
into the environment that shall be used inside KNIME,
as that will conflict with the KNIME Python Scripting API and make importing
knime.scripting.io
fail.
Point KNIME Analytics Platform to a start script which activates the environment you want to use for Python 3. This option assumes that you have created a suitable Python environment earlier with a Python virtual environment manager of your choice. In order to use the created environment, you need to create a start script (shell script on Linux and Mac, batch file on Windows). The script has to meet the following requirements:
It has to start Python with the arguments given to the script (please make sure that spaces are properly escaped)
It has to output standard and error out of the started Python instance
It must not output anything else.
#! /bin/bash
# Start by making sure that the anaconda folder is on the PATH
# so that the source activate command works.
# This isn't necessary if you already know that
# the anaconda bin dir is on the PATH
export PATH="<PATH_WHERE_YOU_INSTALLED_ANACONDA>/bin:$PATH"
conda activate <ENVIRONMENT_NAME>
python "$@" 1>&1 2>&2
On Windows, the script looks like this:
@REM Adapt the folder in the PATH to your system
@SET PATH=<PATH_WHERE_YOU_INSTALLED_ANACONDA>\Scripts;%PATH%
@CALL activate <ENVIRONMENT_NAME> || ECHO Activating python environment failed
@python %*
Besides setting up Python for your entire KNIME workspace via the Preferences page, you can also use the
Conda Environment Propagation node
to configure custom Python environments and then propagate them to downstream Python nodes. This node also allows you to bundle these environments together with your workflows, making it easy for others to replicate the exact same environment that the workflow is meant to be executed in. This makes workflows containing Python nodes significantly more portable and less error-prone.
Setting up
To be able to make use of the Conda Environment Propagation node, you need to follow these steps:
On your local machine, you should have Conda set up and configured in the Preferences of the KNIME Python Integration as described in the
Prerequisites
section
Open the node configuration dialog and select the Conda environment you want to propagate and the packages to include in the environment in case it will be recreated on a different machine. The packages can be selected automatically via the following buttons:
The
Include only explicitly installed
button selects only those packages that were explicitly installed into the environment by the user. This can help avoiding conflicts when using the workflow on different Operating Systems because it allows Conda to resolve the dependencies of those package for the Operating System the workflow is running on.
The Conda Environment Propagation node outputs a flow variable which contains the necessary information about the Python environment (i.e. the name of the environment and the respective installed packages and versions). The flow variable has
conda.environment
as the default name, but you can specify a custom name. This way you can avoid name collisions that may occur when employing multiple Conda Environment Propagation nodes in a single workflow.
Successively open the configuration dialog of the Python nodes in the workflow that you want to make portable.
Open the "Set Python environment" settings page via the kebab menu at the top right, and select which Conda flow variable you want to use.
Deploy the workflow by uploading it to the KNIME Server, sharing it via the KNIME Hub, or exporting it. Make sure that the Conda Environment Propagation node is reset before or during the deployment process.
On the target machine, Conda must also be set up and configured in the Preferences of the KNIME Python Integration. If the target machine runs a KNIME Server, you may need to contact your server administrator or refer to the
Server Administration Guide
in order to do this.
During execution (on either machine), the node will check whether a local Conda environment exists that matches its configured environment. When configuring the node, you can choose which modality will be used for the Conda environment validation on the target machine.
Check name only
will only check for the existence of an environment with the same name as the original one,
Check name and packages
will check both name and requested packages, while
Always overwrite existing environment
will disregard the existence of an equal environment on the target machine and will recreate it.
In case you do not want to use the Conda Environment Propagation node’s functionality, you can also configure individual nodes manually to use specific Python environments. This is done via the flow variable
python3_command
that each Python scripting node offers under the
Flow Variables
tab in its configuration dialog. The variable accepts the path to a Python start script like in the
Manual case
described above.
# A - KNIME Conda Integration - Path to Anaconda/miniconda installation directory
/instance/org.knime.conda/condaDirectoryPath=<path to conda installation dir>
# B - KNIME Python Integration - Default options for Python Integration. By default KNIME uses the bundled environment (shipped with KNIME) if no Conda Environment Propagation node is used.
# Line below can be set to either "bundled" (default), "conda" or "manual"
/instance/org.knime.python3.scripting.nodes/pythonEnvironmentType=bundled
/instance/org.knime.python3.scripting.nodes/bundledCondaEnvPath=org_knime_pythonscripting
# Following rows are only required if "bundled" value above is replaced with "conda"
/instance/org.knime.python3.scripting.nodes/python2CondaEnvironmentDirectoryPath=<path to default conda environment dir>
/instance/org.knime.python3.scripting.nodes/python3CondaEnvironmentDirectoryPath=<path to default conda environment dir>
# Following rows are only required if "bundled" value above is replaced with "manual"
/instance/org.knime.python3.scripting.nodes/python2Path=<path to python2 env>
/instance/org.knime.python3.scripting.nodes/python3Path=<path to python3 env>
# C - KNIME Python Integration (Legacy) - Default options for Python Integration.
# Line below can be set to either "conda" or "manual"
/instance/org.knime.python2/pythonEnvironmentType=conda
/instance/org.knime.python2/defaultPythonOption=python3
/instance/org.knime.python2/serializerId=org.knime.python2.serde.arrow
# Following rows are only required if "conda" is set above
/instance/org.knime.python2/python2CondaEnvironmentDirectoryPath=<path to default conda environment dir>
/instance/org.knime.python2/python3CondaEnvironmentDirectoryPath=<path to default conda environment dir>
# Following rows are only required if "conda" value above is replaced with "manual"
/instance/org.knime.python2/python2Path=<path to python2 env>
/instance/org.knime.python2/python3Path=<path to python3 env>
# D - KNIME Deep Learning Integration
# Select either "python" or "dl" (without quotation marks) in next row. If "python" is used, the configuration of section B above is reused. If "dl" is used, a custom config for Deep Learning can be provided.
/instance/org.knime.dl.python/pythonConfigSelection=python
# Following rows only required if row above is set to "dl"
/instance/org.knime.dl.python/kerasCondaEnvironmentDirectoryPath=<path to default conda environment dir>
/instance/org.knime.dl.python/librarySelection=keras
/instance/org.knime.dl.python/manualConfig=python3
/instance/org.knime.dl.python/pythonEnvironmentType=conda
/instance/org.knime.dl.python/serializerId=org.knime.python2.serde.arrow
/instance/org.knime.dl.python/tf2CondaEnvironmentDirectoryPath=<path to default conda environment dir>
/instance/org.knime.dl.python/tf2ManualConfig=python3
In case you run into issues with KNIME’s Python integration, here are some useful tips to help you gather more information and maybe even resolve the issue yourself. In case the issues persist and you ask for help, please include the gathered information.
Find debug information
Resourceful information helps in understanding issues. Relevant information can be obtained in the following ways.
The
knime.log
contains information logged during the execution of nodes. To obtain it, there are two ways:
Not all logged information is required. Please restrict the information you provide to the issue. If the log file does not contain sufficient information, you can change the logging verbosity in
File → Preferences → KNIME
. You can even log the information to the console in the KNIME Analytics Program:
File → Preferences → KNIME → KNIME GUI
.
If
conda
is used, obtain the information about the used Python environment
<python_env>
via:
If the error
An error occured while installing the items
appears when installing an extension with a bundled Python environment (the KNIME Python Integration itself and pure Python extensions), you can obtain the corresponding log files as follows. The error message contains a
<plugin_name>
like
org.knime.pythonscripting.channel.v1.bin…
or
sdl.harvard.geospatial.channel.bin…
you probably have the package
knime
installed via
pip
in the environment used for the Python script node.
This currently does not work due to a name clash. You can remove
knime
in the respective Python environment
by executing the command
pip uninstall knime
in your terminal.
It can show multiple packages like the following. You can remove both.
If you encounter an SSL error during the execution of a Python scripting node, this might be due to the use of a self-signed certificate.
If other nodes such as the GET Request node work, but the Python Script node does not, you can configure the Python Script nodes to trust the same certificates as the KNIME Analytics Platform.
To do this, add the following line to your
knime.ini
file:
-Dknime.python.cacerts=AP
This will point the
CA_CERTS
and
REQUESTS_CA_BUNDLE
environment variables to a newly created CA bundle that contains the certicate authorities that the KNIME Analytics Platform trusts.
The Python Script node will then trust the same certificates as the KNIME Analytics Platform.