Python language nodes
A Python language node uses kind: python. You write a top-level run(...)
function, or transform(...) for compatibility, and Rime calls it with named
arguments from the YAML in: map.
Minimum Example
Section titled “Minimum Example”- id: features kind: python source: scripts/features.py in: cohort: raw_patients threshold: params.thresholddef run(cohort, threshold): return cohort.assign(flag=cohort["score"] > float(threshold))The slot names in YAML become Python argument names. The runtime owns input loading, output writing, and cache bookkeeping.
Function Signature
Section titled “Function Signature”YAML in: slot | Python value |
|---|---|
Upstream node ref, for example cohort: raw_patients | pandas.DataFrame |
Param ref, for example threshold: params.threshold | native scalar/list/dict |
The default entrypoint is run. transform is accepted for older scripts. To
use another function, set entrypoint: on the node.
def run(cohort, lookup, threshold): # cohort and lookup are pandas DataFrames. # threshold came from params.threshold. ...Outputs
Section titled “Outputs”Single Output
Section titled “Single Output”Return a pandas DataFrame, or a value pandas can turn into one:
def run(orders): return orders[orders["total"] > 0]Downstream nodes reference the default output as filtered.
Multiple Outputs
Section titled “Multiple Outputs”Declare named outputs in YAML and return a dict with matching keys:
- id: split kind: python source: scripts/split.py in: { cohort: features } out: { train: table, test: table }def run(cohort): train = cohort.sample(frac=0.8, random_state=42) test = cohort.drop(train.index) return {"train": train, "test": test}Downstream refs are split.train and split.test.
Non-Tabular Output
Section titled “Non-Tabular Output”For scalars or structured objects, declare an any output:
out: { result: any }Use any for model summaries, config echoes, or compact JSON-like results.
Tables should stay as table outputs.
Plot Capture
Section titled “Plot Capture”Matplotlib figures that are still open when your function returns are captured into the node audit.
import matplotlib.pyplot as plt
def run(cohort): fig, ax = plt.subplots() cohort.plot.scatter(x="age", y="score", ax=ax) ax.set_title("Age vs score") return cohort.describe()You do not need to call a Rime-specific display function. The runner captures open matplotlib figures after the entrypoint returns and stores them with the node diagnostics.
Runtime Model
Section titled “Runtime Model”Python nodes run in a warm Python runner session for the selected interpreter during a CLI/editor run. The runner is isolated from the host Node process, but startup is amortized across Python nodes that share the same interpreter.
Inputs and outputs move through Rime’s Arrow/Parquet-backed artifact path. The runtime converts upstream tables into pandas DataFrames before calling your function, then persists returned tables for downstream nodes.
Environment
Section titled “Environment”Required: Python 3.11+ with pyarrow and pandas.
uv venv .venvsource .venv/bin/activateuv pip install pyarrow pandasrime run pipeline.dag.yaml --python-bin .venv/bin/pythonYou can also set the interpreter in the DAG:
interpreters: python: .venv/bin/pythonSee Also
Section titled “See Also”- R language nodes - same slot protocol, R entrypoint
- JavaScript language nodes -
defineNodeand row arrays - SQL language nodes - DuckDB temp tables
- Language node reference - full field list
- Polyglot runtime overview - cross-language design