1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
|
## Baba Yaga → JavaScript Compiler (Combinator-Based)
This document describes a minimal, practical compiler from Baba Yaga to JavaScript using abstraction elimination into combinators. It reuses the existing front-end (`lexer.js`, `parser.js`) and targets a small JS runtime that implements SKI-family combinators and primitives consistent with `interpreter.js` semantics. The working implementation now lives under `experimental/`.
### Goals
- Compile any program accepted by the current `parser.js` and executed by `interpreter.js`.
- Preserve semantics: numbers with `{ value, isFloat }`, pure and immutable data, partial application, namespacing, and “when” pattern matching.
- Keep changes additive: new `experimental/compiler.js` and a small runtime; no required changes to lexer/parser/interpreter.
- Output readable, runnable JS with a tiny explicit runtime.
### Non-Goals (initially)
- Advanced type-directed optimizations or codegen from declared types.
- Mutual recursion via pure combinator fixpoints (we’ll use two-phase `let`+assignment for simplicity).
- Aggressive size-optimizing bracket abstraction (we can add B/C/η later).
## High-Level Architecture
1. Front-end: reuse `createLexer` and `createParser` to produce AST.
2. Core normalization pass (implemented in `experimental/compiler.js`):
- Desugar to a small core language (λ, application, identifiers, literals, primitives, data).
- Convert infix operators and keywords into primitive function calls.
- Normalize multi-arg functions to nested single-arg lambdas.
- Lower member access to `get key obj` form.
- Lower `when` to a chain of guarded branches with thunks.
3. Abstraction elimination (bracket abstraction) or closure-based codegen (mode-dependent) in `experimental/compiler.js`: remove λs into SKI-family combinators, or emit JS closures directly when selected.
4. Emission: generate JS module text that includes a tiny runtime and the compiled program.
## Source Language Surface (from parser/interpreter)
- Program-level: `TypeDeclaration`, `VariableDeclaration`, `FunctionDeclaration`, expressions.
- Expressions: numbers, strings, booleans, identifiers, anonymous functions, application, unary/binary ops, lists, tables, member access, `ResultExpression` (Ok/Err), `WhenExpression` (patterns: type, result, list, table, wildcard), curried and typed functions.
- Namespaces: `io`, `str`, `math`, list/table functions (append, map, reduce, etc.) via native functions.
## Core IR (post-desugar)
Use a minimal lambda calculus with primitives and data literals:
- Term := `Var(name)` | `Lam(param, term)` | `App(term, term)` | `Const(name)` | `Lit(<number|string|boolean|list|table|result>)`.
- No infix ops; all are `Const` applied to args.
- Member access lowered to `App(App(Const('get'), key), obj)`.
- `when` lowered to nested `cond` forms with thunks.
### Desugaring Rules (selected)
- Multi-arg function `f: (x: T, y: U) -> R -> body` → `Lam(x, Lam(y, body))` with type metadata discarded at runtime.
- Untyped function `x y -> body` → `Lam(x, Lam(y, body))`.
- Anonymous `(x y -> e)` → `Lam(x, Lam(y, e))`.
- Unary minus: `-e` → `App(Const('neg'), e)`.
- Binary `e1 op e2` → `App(App(Const(opName), e1), e2)` where `opName ∈ { add, sub, mul, div, mod, eq, neq, gt, lt, gte, lte, and, or, xor, concatDot }`.
- Member `obj.prop` or `obj.0` → `App(App(Const('get'), keyLit), obj)`.
- `Ok e` → `App(Const('Ok'), e)`; `Err e` → `App(Const('Err'), e)`.
- Lists `[e1, ..., en]` → `Lit([...])` (JS arrays); tables `{ k: v, ... }` → `Lit({ ... })` or `Const('table')(...)` builder (see Runtime).
- `when`:
- `when d1 ... dn is p1 ... pn then c1; ...; _ then cK` → nested `cond` calls:
- `cond (match([p1..pn], [d1..dn])) (thunk(() => c1)) (thunk(() => cond (...)))`.
- Pattern matchers compile to booleans with helpers and do not bind new variables except `ResultPattern` which binds its value.
## Bracket Abstraction
Eliminate lambdas into SKI-family combinators over the core term. Treat any `Var` not equal to the bound variable as free (terminal). Use the standard rules:
- [x]x → I
- [x]M where x ∉ FV(M) → K M
- [x](M N) → S ([x]M) ([x]N)
Notes:
- Multi-arg λ: `[x][y]M` via nesting.
- We can later add optimizations: η-reduction, B/C introduction, constant folding of K-consts, etc.
## Emission Strategy
Output a single JS module string with:
1. Runtime prelude (combinators, curry helpers, primitives, matchers, namespaces).
2. Top-level declarations:
- Functions: declare `let f;` for each function name first (supports mutual recursion), then `f = <compiled expression>;`.
- Variables: `const x = <compiled expression>;`.
3. Exports and format: by default emit UMD for maximal portability (Node/CommonJS, Bun, browsers). Also support format selection via CLI.
### Module formats, exports, and codegen modes
- Default: UMD wrapper that exposes named exports under `module.exports` (CJS), `define` (AMD when available), and global `BabaYaga` namespace in browsers.
- Optional formats: `--format esm|cjs|umd` to force a specific target. ESM emits `export` statements; CJS uses `module.exports = { ... }`.
- Codegen mode: `--mode ski|closure|hybrid` (default: `ski`).
- `ski`: full combinatorization after normalization.
- `closure`: emit JS closures with currying helpers (easiest parity path, useful for debugging and when SKI would bloat output).
- `hybrid`: use closures for complex constructs (e.g., deeply nested `when`, heavy pattern matching) and SKI for simple lambdas; implementation-defined heuristics.
### Entry points and exports
- Compiled module should export named bindings for all top-level declarations.
- Additionally, export a single entry for executing a program:
- `run(host)` preferred: evaluates the compiled `Program` under the provided `host.io` and returns the last expression value.
- Fallback: `main(host)` if a user-defined `main` exists; parity harness will try `run` then `main`.
- Host wiring: prefer a lightweight `run(host)` that creates the runtime namespaces based on `host`. Avoid global side-effects.
### Recursion
- Self-recursion: works via two-phase assignment (`let f; f = ...`), because references resolve at call time.
- Mutual recursion: also works via predeclared `let` for all functions then assignments.
- Optional: pure combinator fixpoint `Z` for self-recursion; not required initially.
## Runtime Design (tiny and explicit)
### Combinators and Helpers
- `I(a) = a`
- `K(a)(b) = a`
- `S(f)(g)(x) = f(x)(g(x))`
- Optional: `B(f)(g)(x) = f(g(x))`, `C(f)(x)(y) = f(y)(x)`, `Z(f) = (g => g(g))(g => f(x => g(g)(x)))` for CBV.
- `curryN(n, f)` and arity-specific `curry2`, `curry3` for primitives.
- `thunk(f)` returns `f` (we pass thunks to `cond`).
- `cond(p)(t)(e) = p ? t() : e()`.
### Numeric and Data Representation
- Numbers are `{ value: number, isFloat: boolean }` to match interpreter.
- Lists as JS arrays; tables as `{ type: 'Object', properties: Map }` built via a small `table(obj)` helper that constructs a `Map`. Access is always via `get(key)(obj)` in compiled code. We will NOT use a Proxy in compiled output; this keeps runtime simple and predictable while preserving interpreter semantics for member/index access.
### Runtime namespace and symbol hygiene
- Prefix all runtime symbols with `__rt` (e.g., `__rt.I`, `__rt.add`, `__rt.get`) to avoid collisions with user-defined names.
- User symbols (top-level functions/vars) are emitted in their original names and exported by the module wrapper.
### Primitives (curried)
- Arithmetic: `add, sub, mul, div, mod, neg` (unwrap/rewrap numbers; div guards divide-by-zero like interpreter).
- Compare: `eq, neq, gt, lt, gte, lte` (numeric unwrapping + generic string/boolean).
- Logic: `and, or, xor` (booleanize inputs).
- String: `concatDot` for `..` (String(lhs) + String(rhs)).
- Result: `Ok, Err` → `{ type: 'Result', variant: 'Ok'|'Err', value }`.
- Core ops: `get`, and helpers for lists/tables (append, prepend, concat, update, removeAt, slice, set, remove, merge, keys, values) mirroring interpreter. In compiled code, `get` returns `undefined` for missing keys/indices (including out-of-bounds array access) to match interpreter behavior.
- Namespaces: `io`, `str`, `math` exposing same functions/signatures; `io` is parameterized by a passed-in host object for integration.
- Matchers: `isType(expected, value)`, `matchList(pattern, value)`, `matchTable(pattern, value)`, `isOk`, `isErr`.
### Pattern Matching Lowering
- Each `when` case compiles to `cond(guard)(() => consequent)(() => next)`, ensuring short-circuiting and correct evaluation order.
- Compound guards (e.g., multi-element list/table checks) are lazily folded using nested `cond` so that later predicates are not evaluated if earlier ones fail. This avoids unintended member/index access and preserves semantics.
- Guards are pure booleans; consequents compiled normally; for `ResultPattern Ok x`, the consequent becomes a λ bound to the inner value and is applied on the success branch.
- Switch micro-optimization: when there is exactly one discriminant and all patterns are literals with no bindings or type/list/table patterns, the compiler may emit a JavaScript `switch` on a normalized key (numbers unwrap to `.value`) for readability/perf. All other cases use the `cond`/if-else chain.
## Mapping from AST → IR → JS (node-by-node)
- Program: collect top-level function names; emit `let` declarations; then emit assignments/consts in order. Optionally return last expression for REPL mode.
- TypeDeclaration: ignored at runtime; optionally preserved as comments.
- VariableDeclaration: `const name = compile(expr)`.
- FunctionDeclaration/AnonymousFunction: normalize to nested `Lam`, then bracket abstraction, then emit the combinator tree expression.
- FunctionCall: normal application; arity handled at runtime (curried).
- NumberLiteral: `{ value: n, isFloat }`. String/Boolean: JS literals.
- UnaryExpression `-x`: `neg x`. BinaryExpression: map to primitive names.
- MemberExpression: `get key obj` (curried).
- ListLiteral/TableLiteral: JS arrays + `table` builder (to get `{type:'Object',properties:Map}`) where needed.
- ResultExpression: `Ok v`/`Err v`.
- WhenExpression: compiled to `cond` chain with matchers.
## Example (sketch)
Source:
```baba
add: (x: Number, y: Number) -> Number -> x + y;
main: -> add 2 3;
```
IR (informal):
```text
add = Lam(x, Lam(y, App(App(Const('add'), Var(x)), Var(y))))
main = Lam(_unit, App(App(Var(add), Lit({2})), Lit({3})))
```
After abstraction elimination (SKI form, schematic):
```text
add = S (K (S (K add) I)) (K (S (K ...) ...)) // produced by the standard rules
```
Emitted JS (abridged):
```js
// runtime: S,K,I,curry2, add, ...
let add;
add = /* SKI of λx.λy. add x y */;
const main = /* SKI of λ_. add 2 3 */;
export { add, main };
```
## Evaluation Strategy
- We generate eager JS; to preserve lazy branches in `when`, consequents are wrapped in thunks and forced via `cond`.
- Curried primitives plus normal JS evaluation replicates call-by-value with partial application like the interpreter.
## Testing Strategy
- Golden tests: compile and run existing `tests/*.test.js`-backed programs using both the interpreter and compiled output; assert observational equivalence of results/side-effects.
- Parity harness:
- For each sample program (or inline snippet), run: `interpret(ast)` and `execute(compiledJS)` with same `host`.
- Compare returned value shapes; for numbers compare `.value` and `isFloat`; for lists/tables structural equality; for `Result` `variant` and inner value.
- Capture `io.out` writes from both executions and compare sequences.
- CI entry: add a job that compiles a subset of fixtures and runs them; failures show diff of values and `io.out`.
- Edge cases: negative numbers vs unary minus; division by zero; list/table bounds; member access; nested `when` with short-circuiting; partial application and closures; recursion and mutual recursion; Ok/Err binding; boolean/logical ops precedence.
### Parity harness usage
- A simple harness exists in `parity.js`:
- Runs interpreter and compiled module, normalizes results to plain JS via a deep-plain conversion, and compares `io.out` arrays.
- It currently imports the compiled module via a data URL and looks for `run(host)` or `main([host])` exports.
- Default compiler mode for parity is `closure` to reduce bring-up friction; once SKI is ready, switch to `--mode ski`.
Examples:
```sh
node parity.js ../example.baba
```
or via package script:
```sh
bun run parity
```
## Status and Next Steps
What’s working now (closure mode MVP):
- Normalization covers literals, identifiers, anonymous and declared functions (curried), calls, unary minus, binary ops (+ - * / % .. and compare/logic), member access via `get`, `Ok`/`Err`, list/table literals, and a basic `when` to `Cond` lowering with thunks.
- Emission produces an ESM module exporting `run(host)`, supporting recursion via let-first then assignment.
- Runtime prelude implements: numbers with `{ value, isFloat }`, curry helpers, arithmetic/compare/logic, `concatDot`, `Ok/Err`, `get`, `cond`, `io` wiring, `table` builder, list HOFs (map/filter/reduce), and core `str.*` helpers.
- Parity harness loads ESM via data URL and can compare interpreter vs compiled; simple programs pass.
Actionable next steps (ordered):
1) Runtime parity with interpreter
- Implement missing list/table primitives: `append`, `prepend`, `concat`, `update`, `removeAt`, `slice`, `set`, `remove`, `merge`, `keys`, `values` with identical semantics (indices unwrap numbers; maps remain `Map`).
- Implement `math.*` namespace mirroring `interpreter.js` (abs, sign, floor, ceil, round, trunc, min, max, clamp, pow, sqrt, exp, log, sin, cos, tan, asin, acos, atan, atan2, deg, rad, random, randomInt).
- Consider `io.emit`/`io.listen` to support host event wiring if compiled code uses them.
2) When/pattern matching completeness
- Extend lowering to fully support: multi-discriminant patterns, `TypePattern`, `ListPattern`, `TablePattern`, and `ResultPattern` with bindings. Ensure every branch is thunked for short-circuit evaluation. Add switch micro-optimization for single discriminant with literal-only patterns (optional).
3) Emission and module formats
- Add named exports for top-level declarations (in addition to `run(host)`).
- Implement `wrapModule` for `umd` and `cjs` formats. Keep `esm` as the default for parity.
- Add `--pretty` and `--sourcemap inline|file` support per the spec.
4) Codegen modes
- Implement SKI mode with bracket abstraction (`I/K/S` at minimum) and corresponding runtime.
- Implement hybrid heuristics: closures for complex `when`/deep terms, SKI for simple lambdas.
5) CLI and DX
- Integrate flags into `runner.js` or provide a `bin/compile` entry. Support `--runtime external:path` to reference an external runtime.
6) Testing and CI
- Run fixture programs under both interpreter and compiler; compare values and `io.out` via parity harness.
- Add CI job to compile and run a representative subset; show diffs on failure.
Notes/pitfalls to keep in mind
- Numbers must preserve `{ value, isFloat }` across all operations and boundaries.
- `get` must support arrays (numeric index) and tables (Map by key). Member access errors should match interpreter behavior.
- `when` consequents must not be evaluated eagerly; use thunks and `__rt.cond`.
- Keep runtime symbol hygiene: all helpers live under `__rt.*` to avoid user name collisions.
## CLI
- Extend `runner.js` (or add `bin/compile`) with:
- `--compile <input.baba>`: compile input file
- `-o, --out <out.js>`: output file path
- `--format <esm|cjs|umd>`: module format (default: `umd`)
- `--mode <ski|closure|hybrid>`: code generation mode (default: `ski`)
- `--runtime <inline|external:path>`: inline runtime by default; or reference external runtime
- `--sourcemap <inline|file|none>`: source map emission (default: `none` initially)
- `--pretty`: pretty-print emitted JS (default off)
Example:
```sh
bun run runner.js --compile src/main.baba -o dist/main.js --format umd --sourcemap file
```
## Source maps (planned)
- Nice-to-have; implement after parity. Approach:
- Thread source locations from `parser.js` into the core IR nodes.
- Generate mappings at IR application/literal/const boundaries rather than raw SKI nodes.
- Support `--sourcemap inline|file` to control emission.
- Mode-aware mapping: in `closure` mode, map directly at emitted closure expressions; in `ski`/`hybrid` modes, map at IR boundaries pre-elimination to keep maps stable.
## Open Questions / TODO
- TODO: Confirm semantics for pattern matching on nested member expressions inside patterns (limited in current parser).
- TODO: Implement source maps per the plan once parity is achieved.
## New files and changes
- Added `experimental/compiler.js`: scaffold with `compile`, normalization and codegen stubs, emission and runtime prelude stubs, and a minimal developer CLI.
- Added `experimental/parity.js`: parity harness described above.
- Updated `package.json` scripts with `parity` command.
- `runtime.js`: optional future extraction if we decide not to inline the prelude.
## Minimalism and Parity
This approach is intentionally minimal: no changes to existing front-end/runtime semantics, small targeted runtime, and direct syntactic lowering. The initial goal is parity with `interpreter.js`; once achieved, we can iterate on size and performance optimizations.
|