Skip to content

Expression language

Rime’s expression language is the shared formula syntax behind core transform nodes. It is intentionally smaller than Python or SQL: enough for row filters, feature columns, grouping keys, aggregate metrics, sort keys, and expression join keys, while staying readable in YAML and inspectable in the editor.

NodeFieldsMeaning
filterexprKeep rows where the expression is truthy
deriveexprCompute one new column named by as
aggregategroupBy, metricsDefine grouping keys and named reductions
selectcolumnsRuntime projection expressions; schema currently restricts these to identifiers
sortby[].exprCompute sort keys
joinleftKey, rightKeyBare column names, or expressions when the key is not a bare identifier

Column names go in square brackets.

expr: "[age] >= 18"
expr: "[Cost of Goods Sold] / [revenue]"

Use brackets even when a column name looks like an identifier. That keeps expressions visually distinct from YAML field names and string literals.

Expressions support numbers, strings, booleans, and null:

expr: "[status] == 'active' and [score] >= 0.8"
expr: "[site] in ('north', 'south')"
expr: "not ([deleted] == true)"

Supported operator groups:

GroupOperators
Arithmetic+, -, *, /, unary -
Comparison==, !=, >, >=, <, <=
Booleanand, or, not
Membershipin (...) with a parenthesized literal list

Parentheses work for grouping.

expr: "([crp_mean] * 2.0 + [ldl_max] * 0.05) / [n_visits]"

Function calls operate across expressions.

FunctionUse
coalesce(a, b, ...)Fill null values from the next expression
concat(a, b, ...)Concatenate expressions as strings
max(a, b, ...)Horizontal maximum across expressions
min(a, b, ...)Horizontal minimum across expressions

Example:

- id: risk_index
kind: derive
inputs: [lab_load]
as: risk_index
expr: "coalesce([crp_mean], 0) * 2.0 + coalesce([ldl_max], 0) * 0.05"

Methods hang off a column or expression.

MethodCommon placeUse
.uppercase(), .lowercase()derive, sort, join keysNormalize strings
.to_date(), .to_int(), .to_float(), .to_string()deriveCast values
.sum(), .mean(), .count(), .min(), .max()aggregate metricsReduce a group
.n_unique(), .distinct()aggregate metricsCount distinct values
.lag(n), .lead(n)sort/derive patternsShift values
.rolling_mean(n)feature engineeringRolling average
.first_value(), .rank()grouped/window-like featuresFirst value or rank
.sort_by(expr), .over(expr)advanced Polars-backed expressionsSort/window context

Aggregate metrics should name their output with an alias expression:

metrics:
- "[mean_crp] = [crp].mean()"
- "[n_visits] = [crp].count()"

Alias expressions use a bracketed output name on the left side:

"[mean_score] = [score].mean()"

Use aliases in aggregate.metrics. For derive, prefer as: instead:

- id: lab_load
kind: derive
as: lab_load
expr: "[crp_mean] * [ldl_max] / 1000.0"
expr: "coalesce([baseline_score], 0) + coalesce([followup_score], 0)"
expr: "[age] >= 18 and [site] in ('north', 'south')"
groupBy:
- "[site]"
metrics:
- "[mean_risk] = [risk_index].mean()"
- "[n] = [patient_id].count()"
- id: joined
kind: join
inputs: [left_table, right_table]
leftKey: "[site].lowercase()"
rightKey: "[site_code].lowercase()"

For important computed keys, a separate derive node is often easier to review than hiding the key logic inside the join.

The expression language is not a general scripting language. Use a Python, R, JavaScript, or SQL node when you need multi-step control flow, external libraries, custom statistical routines, or transformations that are clearer as code.