Skip to content

Quick start

This page runs the smallest complete Rime project: one CSV, one pipeline.dag.yaml, one command. It uses only core nodes, so no Python or R interpreter setup is required.

For a slower tutorial that creates every file from scratch, see the first pipeline workshop.

Terminal window
git clone https://github.com/danielsjoo/rime
cd rime

The example lives at examples/single-file/:

examples/single-file/
├── data/
│ └── penguins.csv
└── pipeline.dag.yaml
specification_version: "2.1"
nodes:
- id: penguins
kind: source
path: data/penguins.csv
- id: adelie_only
kind: filter
inputs: [penguins]
expr: '[species] == "Adelie"'
- id: by_island
kind: aggregate
inputs: [adelie_only]
groupBy: ["[island]"]
metrics:
- "[mean_bill_length] = [bill_length_mm].mean()"
- "[mean_flipper_length] = [flipper_length_mm].mean()"
- "[n] = [bill_length_mm].count()"

What each node does:

NodeKindResult
penguinssourceLoad data/penguins.csv.
adelie_onlyfilterKeep rows where species is Adelie.
by_islandaggregateSummarize the Adelie rows by island.

All relative paths resolve from the directory that contains the DAG file.

Terminal window
rime validate examples/single-file/pipeline.dag.yaml

Validation checks YAML syntax, node schemas, input references, graph cycles, and source paths. It does not execute any nodes.

Terminal window
rime run examples/single-file/pipeline.dag.yaml

Rime writes artifacts next to the DAG:

examples/single-file/outputs/
├── manifest.json
├── penguins/
│ └── default.parquet
├── adelie_only/
│ └── default.parquet
└── by_island/
└── default.parquet

The final table has three rows:

islandmean_bill_lengthmean_flipper_lengthn
Biscoe40.30195.001
Dream39.15180.002
Torgersen39.30183.502
Terminal window
rime build examples/single-file/pipeline.dag.yaml

rime build runs the DAG and writes:

examples/single-file/outputs/run_report.html

Open that file to inspect node status, cache state, output sizes, schemas, preview rows, and the final aggregate output.

Reports include every node by default. Add metadata.report: false to source or staging nodes that should stay out of the HTML report:

- id: penguins
kind: source
path: data/penguins.csv
metadata:
report: false

Use a language node when a built-in node does not fit. Language nodes use a named in: map instead of positional inputs:.

params:
min_mass: { type: float, default: 3500 }
nodes:
- id: heavy_penguins
kind: python
source: scripts/heavy.py
in:
penguins: penguins
min_mass: params.min_mass
scripts/heavy.py
def run(penguins, min_mass):
return penguins[penguins["body_mass_g"] >= float(min_mass)]

Then run with a specific interpreter or a param override:

Terminal window
rime run pipeline.dag.yaml --python-bin .venv/bin/python
rime run pipeline.dag.yaml --param min_mass=3800
CommandUse it for
rime validate pipeline.dag.yamlCheck schema and graph integrity before running.
rime run pipeline.dag.yamlExecute the DAG and persist artifacts.
rime build pipeline.dag.yamlExecute the DAG and render HTML.
rime run pipeline.dag.yaml --no-cache-readForce recompute while still writing fresh cache.
rime run pipeline.dag.yaml --leanRecompute without reading or writing cache artifacts.