## Baba Yaga → JavaScript Compiler (Combinator-Based) This document describes a minimal, practical compiler from Baba Yaga to JavaScript using abstraction elimination into combinators. It reuses the existing front-end (`lexer.js`, `parser.js`) and targets a small JS runtime that implements SKI-family combinators and primitives consistent with `interpreter.js` semantics. The working implementation now lives under `experimental/`. ### Goals - Compile any program accepted by the current `parser.js` and executed by `interpreter.js`. - Preserve semantics: numbers with `{ value, isFloat }`, pure and immutable data, partial application, namespacing, and “when” pattern matching. - Keep changes additive: new `experimental/compiler.js` and a small runtime; no required changes to lexer/parser/interpreter. - Output readable, runnable JS with a tiny explicit runtime. ### Non-Goals (initially) - Advanced type-directed optimizations or codegen from declared types. - Mutual recursion via pure combinator fixpoints (we’ll use two-phase `let`+assignment for simplicity). - Aggressive size-optimizing bracket abstraction (we can add B/C/η later). ## High-Level Architecture 1. Front-end: reuse `createLexer` and `createParser` to produce AST. 2. Core normalization pass (implemented in `experimental/compiler.js`): - Desugar to a small core language (λ, application, identifiers, literals, primitives, data). - Convert infix operators and keywords into primitive function calls. - Normalize multi-arg functions to nested single-arg lambdas. - Lower member access to `get key obj` form. - Lower `when` to a chain of guarded branches with thunks. 3. Abstraction elimination (bracket abstraction) or closure-based codegen (mode-dependent) in `experimental/compiler.js`: remove λs into SKI-family combinators, or emit JS closures directly when selected. 4. Emission: generate JS module text that includes a tiny runtime and the compiled program. ## Source Language Surface (from parser/interpreter) - Program-level: `TypeDeclaration`, `VariableDeclaration`, `FunctionDeclaration`, expressions. - Expressions: numbers, strings, booleans, identifiers, anonymous functions, application, unary/binary ops, lists, tables, member access, `ResultExpression` (Ok/Err), `WhenExpression` (patterns: type, result, list, table, wildcard), curried and typed functions. - Namespaces: `io`, `str`, `math`, list/table functions (append, map, reduce, etc.) via native functions. ## Core IR (post-desugar) Use a minimal lambda calculus with primitives and data literals: - Term := `Var(name)` | `Lam(param, term)` | `App(term, term)` | `Const(name)` | `Lit()`. - No infix ops; all are `Const` applied to args. - Member access lowered to `App(App(Const('get'), key), obj)`. - `when` lowered to nested `cond` forms with thunks. ### Desugaring Rules (selected) - Multi-arg function `f: (x: T, y: U) -> R -> body` → `Lam(x, Lam(y, body))` with type metadata discarded at runtime. - Untyped function `x y -> body` → `Lam(x, Lam(y, body))`. - Anonymous `(x y -> e)` → `Lam(x, Lam(y, e))`. - Unary minus: `-e` → `App(Const('neg'), e)`. - Binary `e1 op e2` → `App(App(Const(opName), e1), e2)` where `opName ∈ { add, sub, mul, div, mod, eq, neq, gt, lt, gte, lte, and, or, xor, concatDot }`. - Member `obj.prop` or `obj.0` → `App(App(Const('get'), keyLit), obj)`. - `Ok e` → `App(Const('Ok'), e)`; `Err e` → `App(Const('Err'), e)`. - Lists `[e1, ..., en]` → `Lit([...])` (JS arrays); tables `{ k: v, ... }` → `Lit({ ... })` or `Const('table')(...)` builder (see Runtime). - `when`: - `when d1 ... dn is p1 ... pn then c1; ...; _ then cK` → nested `cond` calls: - `cond (match([p1..pn], [d1..dn])) (thunk(() => c1)) (thunk(() => cond (...)))`. - Pattern matchers compile to booleans with helpers and do not bind new variables except `ResultPattern` which binds its value. ## Bracket Abstraction Eliminate lambdas into SKI-family combinators over the core term. Treat any `Var` not equal to the bound variable as free (terminal). Use the standard rules: - [x]x → I - [x]M where x ∉ FV(M) → K M - [x](M N) → S ([x]M) ([x]N) Notes: - Multi-arg λ: `[x][y]M` via nesting. - We can later add optimizations: η-reduction, B/C introduction, constant folding of K-consts, etc. ## Emission Strategy Output a single JS module string with: 1. Runtime prelude (combinators, curry helpers, primitives, matchers, namespaces). 2. Top-level declarations: - Functions: declare `let f;` for each function name first (supports mutual recursion), then `f = ;`. - Variables: `const x = ;`. 3. Exports and format: by default emit UMD for maximal portability (Node/CommonJS, Bun, browsers). Also support format selection via CLI. ### Module formats, exports, and codegen modes - Default: UMD wrapper that exposes named exports under `module.exports` (CJS), `define` (AMD when available), and global `BabaYaga` namespace in browsers. - Optional formats: `--format esm|cjs|umd` to force a specific target. ESM emits `export` statements; CJS uses `module.exports = { ... }`. - Codegen mode: `--mode ski|closure|hybrid` (default: `ski`). - `ski`: full combinatorization after normalization. - `closure`: emit JS closures with currying helpers (easiest parity path, useful for debugging and when SKI would bloat output). - `hybrid`: use closures for complex constructs (e.g., deeply nested `when`, heavy pattern matching) and SKI for simple lambdas; implementation-defined heuristics. ### Entry points and exports - Compiled module should export named bindings for all top-level declarations. - Additionally, export a single entry for executing a program: - `run(host)` preferred: evaluates the compiled `Program` under the provided `host.io` and returns the last expression value. - Fallback: `main(host)` if a user-defined `main` exists; parity harness will try `run` then `main`. - Host wiring: prefer a lightweight `run(host)` that creates the runtime namespaces based on `host`. Avoid global side-effects. ### Recursion - Self-recursion: works via two-phase assignment (`let f; f = ...`), because references resolve at call time. - Mutual recursion: also works via predeclared `let` for all functions then assignments. - Optional: pure combinator fixpoint `Z` for self-recursion; not required initially. ## Runtime Design (tiny and explicit) ### Combinators and Helpers - `I(a) = a` - `K(a)(b) = a` - `S(f)(g)(x) = f(x)(g(x))` - Optional: `B(f)(g)(x) = f(g(x))`, `C(f)(x)(y) = f(y)(x)`, `Z(f) = (g => g(g))(g => f(x => g(g)(x)))` for CBV. - `curryN(n, f)` and arity-specific `curry2`, `curry3` for primitives. - `thunk(f)` returns `f` (we pass thunks to `cond`). - `cond(p)(t)(e) = p ? t() : e()`. ### Numeric and Data Representation - Numbers are `{ value: number, isFloat: boolean }` to match interpreter. - Lists as JS arrays; tables as `{ type: 'Object', properties: Map }` built via a small `table(obj)` helper that constructs a `Map`. Access is always via `get(key)(obj)` in compiled code. We will NOT use a Proxy in compiled output; this keeps runtime simple and predictable while preserving interpreter semantics for member/index access. ### Runtime namespace and symbol hygiene - Prefix all runtime symbols with `__rt` (e.g., `__rt.I`, `__rt.add`, `__rt.get`) to avoid collisions with user-defined names. - User symbols (top-level functions/vars) are emitted in their original names and exported by the module wrapper. ### Primitives (curried) - Arithmetic: `add, sub, mul, div, mod, neg` (unwrap/rewrap numbers; div guards divide-by-zero like interpreter). - Compare: `eq, neq, gt, lt, gte, lte` (numeric unwrapping + generic string/boolean). - Logic: `and, or, xor` (booleanize inputs). - String: `concatDot` for `..` (String(lhs) + String(rhs)). - Result: `Ok, Err` → `{ type: 'Result', variant: 'Ok'|'Err', value }`. - Core ops: `get`, and helpers for lists/tables (append, prepend, concat, update, removeAt, slice, set, remove, merge, keys, values) mirroring interpreter. In compiled code, `get` returns `undefined` for missing keys/indices (including out-of-bounds array access) to match interpreter behavior. - Namespaces: `io`, `str`, `math` exposing same functions/signatures; `io` is parameterized by a passed-in host object for integration. - Matchers: `isType(expected, value)`, `matchList(pattern, value)`, `matchTable(pattern, value)`, `isOk`, `isErr`. ### Pattern Matching Lowering - Each `when` case compiles to `cond(guard)(() => consequent)(() => next)`, ensuring short-circuiting and correct evaluation order. - Compound guards (e.g., multi-element list/table checks) are lazily folded using nested `cond` so that later predicates are not evaluated if earlier ones fail. This avoids unintended member/index access and preserves semantics. - Guards are pure booleans; consequents compiled normally; for `ResultPattern Ok x`, the consequent becomes a λ bound to the inner value and is applied on the success branch. - Switch micro-optimization: when there is exactly one discriminant and all patterns are literals with no bindings or type/list/table patterns, the compiler may emit a JavaScript `switch` on a normalized key (numbers unwrap to `.value`) for readability/perf. All other cases use the `cond`/if-else chain. ## Mapping from AST → IR → JS (node-by-node) - Program: collect top-level function names; emit `let` declarations; then emit assignments/consts in order. Optionally return last expression for REPL mode. - TypeDeclaration: ignored at runtime; optionally preserved as comments. - VariableDeclaration: `const name = compile(expr)`. - FunctionDeclaration/AnonymousFunction: normalize to nested `Lam`, then bracket abstraction, then emit the combinator tree expression. - FunctionCall: normal application; arity handled at runtime (curried). - NumberLiteral: `{ value: n, isFloat }`. String/Boolean: JS literals. - UnaryExpression `-x`: `neg x`. BinaryExpression: map to primitive names. - MemberExpression: `get key obj` (curried). - ListLiteral/TableLiteral: JS arrays + `table` builder (to get `{type:'Object',properties:Map}`) where needed. - ResultExpression: `Ok v`/`Err v`. - WhenExpression: compiled to `cond` chain with matchers. ## Example (sketch) Source: ```baba add: (x: Number, y: Number) -> Number -> x + y; main: -> add 2 3; ``` IR (informal): ```text add = Lam(x, Lam(y, App(App(Const('add'), Var(x)), Var(y)))) main = Lam(_unit, App(App(Var(add), Lit({2})), Lit({3}))) ``` After abstraction elimination (SKI form, schematic): ```text add = S (K (S (K add) I)) (K (S (K ...) ...)) // produced by the standard rules ``` Emitted JS (abridged): ```js // runtime: S,K,I,curry2, add, ... let add; add = /* SKI of λx.λy. add x y */; const main = /* SKI of λ_. add 2 3 */; export { add, main }; ``` ## Evaluation Strategy - We generate eager JS; to preserve lazy branches in `when`, consequents are wrapped in thunks and forced via `cond`. - Curried primitives plus normal JS evaluation replicates call-by-value with partial application like the interpreter. ## Testing Strategy - Golden tests: compile and run existing `tests/*.test.js`-backed programs using both the interpreter and compiled output; assert observational equivalence of results/side-effects. - Parity harness: - For each sample program (or inline snippet), run: `interpret(ast)` and `execute(compiledJS)` with same `host`. - Compare returned value shapes; for numbers compare `.value` and `isFloat`; for lists/tables structural equality; for `Result` `variant` and inner value. - Capture `io.out` writes from both executions and compare sequences. - CI entry: add a job that compiles a subset of fixtures and runs them; failures show diff of values and `io.out`. - Edge cases: negative numbers vs unary minus; division by zero; list/table bounds; member access; nested `when` with short-circuiting; partial application and closures; recursion and mutual recursion; Ok/Err binding; boolean/logical ops precedence. ### Parity harness usage - A simple harness exists in `parity.js`: - Runs interpreter and compiled module, normalizes results to plain JS via a deep-plain conversion, and compares `io.out` arrays. - It currently imports the compiled module via a data URL and looks for `run(host)` or `main([host])` exports. - Default compiler mode for parity is `closure` to reduce bring-up friction; once SKI is ready, switch to `--mode ski`. Examples: ```sh node parity.js ../example.baba ``` or via package script: ```sh bun run parity ``` ## Status and Next Steps What’s working now (closure mode MVP): - Normalization covers literals, identifiers, anonymous and declared functions (curried), calls, unary minus, binary ops (+ - * / % .. and compare/logic), member access via `get`, `Ok`/`Err`, list/table literals, and a basic `when` to `Cond` lowering with thunks. - Emission produces an ESM module exporting `run(host)`, supporting recursion via let-first then assignment. - Runtime prelude implements: numbers with `{ value, isFloat }`, curry helpers, arithmetic/compare/logic, `concatDot`, `Ok/Err`, `get`, `cond`, `io` wiring, `table` builder, list HOFs (map/filter/reduce), and core `str.*` helpers. - Parity harness loads ESM via data URL and can compare interpreter vs compiled; simple programs pass. Actionable next steps (ordered): 1) Runtime parity with interpreter - Implement missing list/table primitives: `append`, `prepend`, `concat`, `update`, `removeAt`, `slice`, `set`, `remove`, `merge`, `keys`, `values` with identical semantics (indices unwrap numbers; maps remain `Map`). - Implement `math.*` namespace mirroring `interpreter.js` (abs, sign, floor, ceil, round, trunc, min, max, clamp, pow, sqrt, exp, log, sin, cos, tan, asin, acos, atan, atan2, deg, rad, random, randomInt). - Consider `io.emit`/`io.listen` to support host event wiring if compiled code uses them. 2) When/pattern matching completeness - Extend lowering to fully support: multi-discriminant patterns, `TypePattern`, `ListPattern`, `TablePattern`, and `ResultPattern` with bindings. Ensure every branch is thunked for short-circuit evaluation. Add switch micro-optimization for single discriminant with literal-only patterns (optional). 3) Emission and module formats - Add named exports for top-level declarations (in addition to `run(host)`). - Implement `wrapModule` for `umd` and `cjs` formats. Keep `esm` as the default for parity. - Add `--pretty` and `--sourcemap inline|file` support per the spec. 4) Codegen modes - Implement SKI mode with bracket abstraction (`I/K/S` at minimum) and corresponding runtime. - Implement hybrid heuristics: closures for complex `when`/deep terms, SKI for simple lambdas. 5) CLI and DX - Integrate flags into `runner.js` or provide a `bin/compile` entry. Support `--runtime external:path` to reference an external runtime. 6) Testing and CI - Run fixture programs under both interpreter and compiler; compare values and `io.out` via parity harness. - Add CI job to compile and run a representative subset; show diffs on failure. Notes/pitfalls to keep in mind - Numbers must preserve `{ value, isFloat }` across all operations and boundaries. - `get` must support arrays (numeric index) and tables (Map by key). Member access errors should match interpreter behavior. - `when` consequents must not be evaluated eagerly; use thunks and `__rt.cond`. - Keep runtime symbol hygiene: all helpers live under `__rt.*` to avoid user name collisions. ## CLI - Extend `runner.js` (or add `bin/compile`) with: - `--compile `: compile input file - `-o, --out `: output file path - `--format `: module format (default: `umd`) - `--mode `: code generation mode (default: `ski`) - `--runtime `: inline runtime by default; or reference external runtime - `--sourcemap `: source map emission (default: `none` initially) - `--pretty`: pretty-print emitted JS (default off) Example: ```sh bun run runner.js --compile src/main.baba -o dist/main.js --format umd --sourcemap file ``` ## Source maps (planned) - Nice-to-have; implement after parity. Approach: - Thread source locations from `parser.js` into the core IR nodes. - Generate mappings at IR application/literal/const boundaries rather than raw SKI nodes. - Support `--sourcemap inline|file` to control emission. - Mode-aware mapping: in `closure` mode, map directly at emitted closure expressions; in `ski`/`hybrid` modes, map at IR boundaries pre-elimination to keep maps stable. ## Open Questions / TODO - TODO: Confirm semantics for pattern matching on nested member expressions inside patterns (limited in current parser). - TODO: Implement source maps per the plan once parity is achieved. ## New files and changes - Added `experimental/compiler.js`: scaffold with `compile`, normalization and codegen stubs, emission and runtime prelude stubs, and a minimal developer CLI. - Added `experimental/parity.js`: parity harness described above. - Updated `package.json` scripts with `parity` command. - `runtime.js`: optional future extraction if we decide not to inline the prelude. ## Minimalism and Parity This approach is intentionally minimal: no changes to existing front-end/runtime semantics, small targeted runtime, and direct syntactic lowering. The initial goal is parity with `interpreter.js`; once achieved, we can iterate on size and performance optimizations.