Rime
What is Rime?
Section titled “What is Rime?”Rime is a runtime for reproducible data work. You declare the pipeline once in pipeline.dag.yaml. Rime runs the graph, caches each node, captures logs, validates outputs, and writes artifacts.
specification_version: "2.1"
nodes: - id: raw_orders kind: sql source: queries/load_orders.sql
- id: order_metrics kind: derive inputs: [raw_orders] as: revenue expr: "[unit_price] * [quantity]"
- id: sales_chart kind: python source: scripts/plot_sales.py in: orders: order_metricsHere, SQL imports data, the derive node computes one reviewable feature with Rime’s expression language, and Python graphs the result. Rime captures intermediate data and script side effects, then produces a report with a runtime overview like this.
Why Rime
Section titled “Why Rime”⚡ Functions, not jobs
A node is a function over dataframes, not a task that wires I/O. You write what each step computes; the runtime owns reading, writing, serialization, and language boundaries. The dbt mental model, extended past SQL.
🧰 One DAG, four languages
SQL for joins, Python for ML, R for stats, JavaScript for everything else. Same pipeline, named slots, typed boundaries. Dataframes cross language borders through Arrow-backed payloads instead of ad hoc CSV handoffs.
🔒 Reproducible by default
Content-addressed caching, deterministic outputs, freeze-able snapshots. Same script plus same inputs means the same artifact, every time. No “works on my machine.”
📄 Publishable narratives
Render a publishable HTML report directly from your DAG. Tables, stats, stdout, figures, and node status: one render step, one document, one source of truth.
How Rime is different
Section titled “How Rime is different”- Airflow and Prefect orchestrate recurring jobs; Rime is local and one run.
- Reads, writes, retries, errors, and persistence are usually coded inside tasks.
- Rime owns dataframe handoff, execution order, caching, logs, validation, and outputs.
@taskdef load_orders(): orders = read_sql("SELECT * FROM orders") orders.to_parquet("outputs/raw_orders.parquet")
@taskdef plot_sales(): orders = pd.read_parquet("outputs/raw_orders.parquet") plot(orders)
@flowdef nightly_sales(): load_orders() plot_sales()nodes: - id: raw_orders kind: sql source: queries/load_orders.sql
- id: sales_chart kind: python source: scripts/plot_sales.py in: orders: raw_orders- Hex is a proprietary notebook-style workspace; Rime is open source and file-backed.
- Rime projects are portable
pipeline.dag.yamlfiles. - Rime’s Directed Acyclic Graph (DAG) is made of functions, not inline notebook modifications.
# SQL cell: raw_ordersSELECT *FROM warehouse.orders
# Python cell: order_metricsorder_metrics = raw_orders.copy()order_metrics["revenue"] = ( order_metrics["unit_price"] * order_metrics["quantity"])
# Python cell: sales_chartplot_sales(order_metrics)nodes: - id: raw_orders kind: sql source: queries/load_orders.sql
- id: order_metrics kind: derive inputs: [raw_orders] as: revenue expr: "[unit_price] * [quantity]"
- id: sales_chart kind: python source: scripts/plot_sales.py in: orders: order_metrics- Snakemake dependencies are based on files; Rime dependencies are based on nodes.
- Reads, writes, errors, and data persistence are usually coded manually in rules or scripts.
- Snakemake freshness is based mainly on file timestamps and metadata, not runtime-managed node identity.
rule load_orders: output: "outputs/raw_orders.parquet" shell: "python scripts/load_orders.py {output}"
rule plot_sales: input: "outputs/raw_orders.parquet" output: "outputs/sales.png" shell: "python scripts/plot_sales.py {input} {output}"nodes: - id: raw_orders kind: python source: scripts/load_orders.py
- id: sales_chart kind: python source: scripts/plot_sales.py in: orders: raw_ordersChoose your surface
Section titled “Choose your surface”The Editor and the CLI both consume the same pipeline.dag.yaml. Start visually, drop to YAML, or run the same project in CI.