Skip to content

Python language nodes

A Python language node uses kind: python. You write a top-level run(...) function, or transform(...) for compatibility, and Rime calls it with named arguments from the YAML in: map.

- id: features
kind: python
source: scripts/features.py
in:
cohort: raw_patients
threshold: params.threshold
scripts/features.py
def run(cohort, threshold):
return cohort.assign(flag=cohort["score"] > float(threshold))

The slot names in YAML become Python argument names. The runtime owns input loading, output writing, and cache bookkeeping.

YAML in: slotPython value
Upstream node ref, for example cohort: raw_patientspandas.DataFrame
Param ref, for example threshold: params.thresholdnative scalar/list/dict

The default entrypoint is run. transform is accepted for older scripts. To use another function, set entrypoint: on the node.

def run(cohort, lookup, threshold):
# cohort and lookup are pandas DataFrames.
# threshold came from params.threshold.
...

Return a pandas DataFrame, or a value pandas can turn into one:

def run(orders):
return orders[orders["total"] > 0]

Downstream nodes reference the default output as filtered.

Declare named outputs in YAML and return a dict with matching keys:

- id: split
kind: python
source: scripts/split.py
in: { cohort: features }
out: { train: table, test: table }
def run(cohort):
train = cohort.sample(frac=0.8, random_state=42)
test = cohort.drop(train.index)
return {"train": train, "test": test}

Downstream refs are split.train and split.test.

For scalars or structured objects, declare an any output:

out: { result: any }

Use any for model summaries, config echoes, or compact JSON-like results. Tables should stay as table outputs.

Matplotlib figures that are still open when your function returns are captured into the node audit.

import matplotlib.pyplot as plt
def run(cohort):
fig, ax = plt.subplots()
cohort.plot.scatter(x="age", y="score", ax=ax)
ax.set_title("Age vs score")
return cohort.describe()

You do not need to call a Rime-specific display function. The runner captures open matplotlib figures after the entrypoint returns and stores them with the node diagnostics.

Python nodes run in a warm Python runner session for the selected interpreter during a CLI/editor run. The runner is isolated from the host Node process, but startup is amortized across Python nodes that share the same interpreter.

Inputs and outputs move through Rime’s Arrow/Parquet-backed artifact path. The runtime converts upstream tables into pandas DataFrames before calling your function, then persists returned tables for downstream nodes.

Required: Python 3.11+ with pyarrow and pandas.

Terminal window
uv venv .venv
source .venv/bin/activate
uv pip install pyarrow pandas
rime run pipeline.dag.yaml --python-bin .venv/bin/python

You can also set the interpreter in the DAG:

interpreters:
python: .venv/bin/python