Expression language

Rime’s expression language is the shared formula syntax behind core transform nodes. It is intentionally smaller than Python or SQL: enough for row filters, feature columns, grouping keys, aggregate metrics, sort keys, and expression join keys, while staying readable in YAML and inspectable in the editor.

Where Expressions Appear

Node	Fields	Meaning
`filter`	`expr`	Keep rows where the expression is truthy
`derive`	`expr`	Compute one new column named by `as`
`aggregate`	`groupBy`, `metrics`	Define grouping keys and named reductions
`select`	`columns`	Runtime projection expressions; schema currently restricts these to identifiers
`sort`	`by[].expr`	Compute sort keys
`join`	`leftKey`, `rightKey`	Bare column names, or expressions when the key is not a bare identifier

Column References

Column names go in square brackets.

expr: "[age] >= 18"
expr: "[Cost of Goods Sold] / [revenue]"

Use brackets even when a column name looks like an identifier. That keeps expressions visually distinct from YAML field names and string literals.

Literals And Operators

Expressions support numbers, strings, booleans, and null:

expr: "[status] == 'active' and [score] >= 0.8"
expr: "[site] in ('north', 'south')"
expr: "not ([deleted] == true)"

Supported operator groups:

Group	Operators
Arithmetic	`+`, `-`, `*`, `/`, unary `-`
Comparison	`==`, `!=`, `>`, `>=`, `<`, `<=`
Boolean	`and`, `or`, `not`
Membership	`in (...)` with a parenthesized literal list

Parentheses work for grouping.

expr: "([crp_mean] * 2.0 + [ldl_max] * 0.05) / [n_visits]"

Function Calls

Function calls operate across expressions.

Function	Use
`coalesce(a, b, ...)`	Fill null values from the next expression
`concat(a, b, ...)`	Concatenate expressions as strings
`max(a, b, ...)`	Horizontal maximum across expressions
`min(a, b, ...)`	Horizontal minimum across expressions

Example:

- id: risk_index
  kind: derive
  inputs: [lab_load]
  as: risk_index
  expr: "coalesce([crp_mean], 0) * 2.0 + coalesce([ldl_max], 0) * 0.05"

Column Methods

Methods hang off a column or expression.

Method	Common place	Use
`.uppercase()`, `.lowercase()`	derive, sort, join keys	Normalize strings
`.to_date()`, `.to_int()`, `.to_float()`, `.to_string()`	derive	Cast values
`.sum()`, `.mean()`, `.count()`, `.min()`, `.max()`	aggregate metrics	Reduce a group
`.n_unique()`, `.distinct()`	aggregate metrics	Count distinct values
`.lag(n)`, `.lead(n)`	sort/derive patterns	Shift values
`.rolling_mean(n)`	feature engineering	Rolling average
`.first_value()`, `.rank()`	grouped/window-like features	First value or rank
`.sort_by(expr)`, `.over(expr)`	advanced Polars-backed expressions	Sort/window context

Aggregate metrics should name their output with an alias expression:

metrics:
  - "[mean_crp] = [crp].mean()"
  - "[n_visits] = [crp].count()"

Alias Expressions

Alias expressions use a bracketed output name on the left side:

"[mean_score] = [score].mean()"

Use aliases in aggregate.metrics. For derive, prefer as: instead:

- id: lab_load
  kind: derive
  as: lab_load
  expr: "[crp_mean] * [ldl_max] / 1000.0"

Practical Patterns

Null-safe Score

expr: "coalesce([baseline_score], 0) + coalesce([followup_score], 0)"

Cohort Filter

expr: "[age] >= 18 and [site] in ('north', 'south')"

Grouped Rollup

groupBy:
  - "[site]"
metrics:
  - "[mean_risk] = [risk_index].mean()"
  - "[n] = [patient_id].count()"

Computed Join Key

- id: joined
  kind: join
  inputs: [left_table, right_table]
  leftKey: "[site].lowercase()"
  rightKey: "[site_code].lowercase()"

For important computed keys, a separate derive node is often easier to review than hiding the key logic inside the join.

Limits

The expression language is not a general scripting language. Use a Python, R, JavaScript, or SQL node when you need multi-step control flow, external libraries, custom statistical routines, or transformations that are clearer as code.