diff options
Diffstat (limited to 'js/baba-yaga/README.md')
-rw-r--r-- | js/baba-yaga/README.md | 614 |
1 files changed, 149 insertions, 465 deletions
diff --git a/js/baba-yaga/README.md b/js/baba-yaga/README.md index f307774..947e91b 100644 --- a/js/baba-yaga/README.md +++ b/js/baba-yaga/README.md @@ -1,519 +1,203 @@ -# Baba Yaga - -Baba Yaga is a small functional scripting language implemented in JavaScript. It is designed as a learning exercise and a platform for experimenting with new language features and syntax ideas. It emphasizes functional programming patterns, particularly through currying and combinator-like `when` expressions for pattern matching. - -## Table of Contents - -- [Baba Yaga](#baba-yaga) - - [Table of Contents](#table-of-contents) - - [1. Project Overview](#1-project-overview) - - [2. Architecture](#2-architecture) - - [3. Extensibility: How to Add New Features](#3-extensibility-how-to-add-new-features) - - [4. Language Features Reference Card](#4-language-features-reference-card) - - [Comments](#comments) - - [Types](#types) - - [Constants](#constants) - - [Lists](#lists) - - [Tables](#tables) - - [Variable and Type Declarations](#variable-and-type-declarations) - - [Functions (Anonymous, Currying, Partial Application, Recursion, Mutual Recursion)](#functions-anonymous-currying-partial-application-recursion-mutual-recursion) - - [Higher-Order Functions](#higher-order-functions) - - [Immutable List Operations](#immutable-list-operations) - - [Immutable Table Operations](#immutable-table-operations) - - [Operators](#operators) - - [Control Flow: The `when` Expression](#control-flow-the-when-expression) - - [Error Handling: The `Result` Type](#error-handling-the-result-type) - - [Local Bindings: Header `with` (and `with rec`)](#local-bindings-header-with-and-with-rec) - - [5. Getting Started](#5-getting-started) - - [6. Running Examples](#6-running-examples) - - [7. Testing](#7-testing) - - [8. Reference Card](#8-reference-card) - - [9. REPL](#9-repl) - - [10. Typed Functions and Errors](#10-typed-functions-and-errors) - - [Compiling to JavaScript (closure mode)](#compiling-to-javascript-closure-mode) - ---- - -## 1. Project Overview - -This language is a small, dynamically typed (with optional static type annotations) scripting language. Its syntax is inspired by ML-family languages, focusing on immutability and expression-oriented programming. Key features include: - -* **Functional Core:** Emphasizes anonymous functions, currying, partial application, and **recursive functions**. -* **Pattern Matching:** A powerful `when` expression for control flow, supporting literal, multi-argument, and type-based matching. -* **Robust Error Handling:** A `Result` type for explicit success/failure propagation, inspired by Rust. -* **Simple Syntax:** Designed to be easy to read and write. -* **Recursive Functions:** Full support for self-recursive functions with proper scoping. -* **Mathematical Constants:** Built-in `PI` and `INFINITY` constants for mathematical operations. -* **Immutable Data Structures:** All list and table operations are immutable, ensuring data integrity. - -## 2. Architecture - -The language engine is structured into three main components, following a classic compiler/interpreter design pattern: - -* ### Lexer (or Tokenizer) - `lexer.js` - The lexer is the first phase of processing. It takes the raw source code as a string and breaks it down into a stream of meaningful units called **tokens**. Each token represents a fundamental building block of the language, like keywords, identifiers, operators, numbers, and strings. It also handles whitespace and comments by removing them from the token stream. - -* ### Parser - `parser.js` - The parser takes the stream of tokens generated by the lexer and builds an **Abstract Syntax Tree (AST)**. The AST is a representation of the program's syntactic structure, abstracting away the actual syntax. It organizes the tokens into a hierarchical structure that reflects the grammatical rules of the language, making it easier for the interpreter to understand and execute the code. The parser is responsible for enforcing syntax rules and reporting syntax errors. - -* ### Interpreter - `interpreter.js` - The interpreter is the execution engine of the language. It traverses the AST generated by the parser and evaluates each node, performing the operations defined by the program. It manages the program's state, including variable scopes, function definitions, and type information. The interpreter directly executes the program's logic, handling function calls, control flow, and data manipulation. - -## 3. Extensibility: How to Add New Features - -Extending this language involves modifying one or more of the core architectural components: - -* **Adding New Keywords or Operators:** - * **`lexer.js`** - : Update the `tokenTypes` object with new token types. Add the new keyword/operator to the `keywords` array or the `if` conditions in `nextToken` to recognize and categorize it. - * **`parser.js`** - : If the new token introduces new syntax, you'll need to add new parsing functions or modify existing ones (e.g., `parsePrimary`, `parseExpression`) to create corresponding AST nodes. - * **`interpreter.js`** - : Implement `visit` methods for any new AST node types to define their runtime behavior. - -* **Adding New Data Types:** - * **`lexer.js`** - : Add the new type name to the `tokenTypes` and ensure `nextToken` correctly identifies it as a `TYPE`. - * **`parser.js`** - : Modify `parseTypeDeclaration` and potentially `parsePrimary` or `parsePattern` to create AST nodes for the new type. - * **`interpreter.js`** - : Update `visitVariableDeclaration` and `visitWhenExpression` (for type matching) to handle the new type's runtime representation and type checking. - -* **Adding New Control Flow or Language Constructs:** - * **`lexer.js`** - : Add any new keywords or symbols required for the construct. - * **`parser.js`** - : Implement new parsing functions to build the AST representation of the new construct. This often involves defining new AST node types. - * **`interpreter.js`** - : Implement `visit` methods for the new AST nodes to define their execution logic. - -## 4. Language Features Reference Card - -### Comments - -Single-line comments start with `//`. +# Baba Yaga Programming Language -```baba -// This is a comment -myVar : 10; // This is also a comment -``` - -### Types - -The language supports basic types and a `Result` type. Type declarations are optional but enforced if provided. - -* `Int`: Integer numbers (e.g., `10`, `0`, `-5`). -* `Float`: Floating-point numbers (e.g., `3.14`, `0.5`). -* `String`: Text enclosed in double quotes (e.g., `"hello"`, `"world"`). -* `Result`: A union type representing either a successful outcome (`Ok`) or an error (`Err`). - -### Constants - -The language provides built-in mathematical constants `PI` and `INFINITY`. - -```baba -// Using mathematical constants -circumference : radius -> 2 * PI * radius; -area : radius -> PI * radius * radius; - -result : (circumference 5); -``` - -### Lists - -Lists are ordered collections of values, enclosed in square brackets `[]`. - -```baba -myList : [1, 2, 3, "hello"]; - -// Accessing elements by index (0-based) -firstElement : myList.0; -secondElement : myList.1; -``` - -### Tables - -Tables (or records/objects) are unordered collections of key-value pairs enclosed in curly braces `{}`. Keys are identifiers, and values can be any expression. - -```baba -myTable : { name: "Lucy Snowe", age: 23, isActive: true }; - -// Accessing properties by key -userName : myTable.name; -userAge : myTable.age; -``` - -### Variable and Type Declarations - -Variables are declared using an identifier followed by a colon and their value. Optional type annotations can precede the variable name. - -```baba -// Type declaration -myNumber Int; +A functional programming language with immutable data structures, pattern matching, and explicit error handling. -// Variable declaration with type annotation -myNumber : 10; +## Quick Start -// Variable declaration without explicit type (type inferred) -greeting : "Hello!"; -``` - -### Functions (Anonymous, Currying, Partial Application, Recursion, Mutual Recursion) - -Functions are anonymous and defined using the `->` arrow syntax. Currying is supported by chaining `->` for multiple arguments. **Recursive functions are fully supported** with proper scoping, including **mutual recursion**. - -```baba -// A simple function -addOne : x -> x + 1; - -// A curried function -add : x -> y -> x + y; - -// Partial application -add5 : add 5; // add5 is now a function that takes one argument (y) and adds 5 to it - -// Function call -result : add5 10; // result will be 15 - -// Recursive function (factorial) -factorial : n -> - when n is - 0 then 1 - 1 then 1 - _ then n * (factorial (n - 1)); - -// Mutual recursion (isEven/isOdd) -isEven : n -> - when n is - 0 then true - 1 then false - _ then isOdd (n - 1); - -isOdd : n -> - when n is - 0 then false - 1 then true - _ then isEven (n - 1); - -// Recursive function (Fibonacci) -fibonacci : n -> - when n is - 0 then 0 - 1 then 1 - _ then (fibonacci (n - 1)) + (fibonacci (n - 2)); - -// Arrow functions in table literals -calculator : { - add: x y -> x + y; - subtract: x y -> x - y; - multiply: x y -> x * (y + 1); - constant: -> 42; -}; - -// Calling functions from tables -result : calculator.add 5 3; // 8 -sum : calculator.add 10 20; // 30 -``` - -### Higher-Order Functions - -The language provides built-in higher-order functions for list manipulation: - -* `map`: Applies a function to each element of a list, returning a new list with the results. - ```baba - doubledList : map (x -> x * 2) [1, 2, 3]; // doubledList will be [2, 4, 6] - ``` - -* `filter`: Creates a new list containing only elements for which a given predicate function returns `true`. - ```baba - evenNumbers : filter (x -> x % 2 = 0) [1, 2, 3, 4, 5]; // evenNumbers will be [2, 4] - ``` - -* `reduce`: Applies a function against an accumulator and each element in the list (from left to right) to reduce it to a single value. - ```baba - sum : reduce (acc item -> acc + item) 0 [1, 2, 3, 4]; // sum will be 10 - ``` - -### Immutable List Operations +```bash +# Run a program +bun run index.js example.baba -All list operations are immutable, returning new lists without modifying the original: +# Or use the compiled binary +./build/baba-yaga-macos-arm64 example.baba -* `append(list, element)`: Add element to the end of a list. -* `prepend(element, list)`: Add element to the beginning of a list. -* `concat(list1, list2)`: Combine two lists. -* `update(list, index, value)`: Replace element at specific index. -* `removeAt(list, index)`: Remove element at specific index. -* `slice(list, start, end)`: Get a sublist from start to end. +# Interactive REPL +bun run repl.js -```baba -original : [1, 2, 3]; -modified : append original 4; // [1, 2, 3, 4] -// original is still [1 2 3] - immutable! +# Run tests +bun test ``` -### Immutable Table Operations +## Language Features -All table operations are immutable, returning new tables without modifying the original: +- **Immutable by default** - All data structures are immutable +- **Pattern matching** - Powerful `when` expressions with guards for control flow +- **Explicit error handling** - `Result` types with `Ok` and `Err` +- **Currying & partial application** - Functions are curried by default +- **Higher-order functions** - `map`, `filter`, `reduce`, `flatMap`, and more +- **Array programming** - APL/K-inspired operations: `scan`, `at`, `where`, `broadcast`, `zipWith`, `reshape` +- **Function combinators** - `compose`, `pipe`, `apply`, `flip` for functional composition +- **Type annotations** - Optional static typing with runtime validation +- **Rich standard library** - Comprehensive math, string, list, table, validation, and debugging utilities +- **Local bindings** - `with` blocks for staging computations and mutual recursion -* `set(table, key, value)`: Add or update a property in a table. -* `remove(table, key)`: Remove a property from a table. -* `merge(table1, table2)`: Combine two tables (table2 properties override table1). -* `keys(table)`: Get a list of all keys in a table. -* `values(table)`: Get a list of all values in a table. +## Project Structure -```baba -user : {name: "Lucy Snowe", age: 23}; -updated : set user "city" "Villette"; // {name: "Lucy Snowe", age: 23, city: "Villette"} -// user is still {name: "Lucy Snowe" age: 23} - immutable! ``` - -### Operators - -The language supports standard arithmetic and comparison operators. - -* Arithmetic: `+`, `-`, `*`, `/` -* Comparison: `=`, `>`, `<` - -### Control Flow: The `when` Expression - -The `when` expression provides powerful pattern matching for control flow. It evaluates a discriminant against a series of patterns. - -* **Literal Matching:** - ```baba - checkNumber : num -> - when num is - 1 then "One" - 2 then "Two" - _ then "Something else"; // '_' matches any value - ``` - -* **Multiple Discriminants:** - ```baba - checkCoords : x y -> - when x y is - 0 0 then "Origin" - 1 1 then "Diagonal" - _ _ then "Somewhere else"; - ``` - -* **Type Matching:** - ```baba - checkType : val -> - when val is - Int then "It's an Integer" - String then "It's a String" - _ then "Unknown Type"; - ``` - -### Error Handling: The `Result` Type - -The `Result` type is used for explicit error handling. Functions can return `Ok` for success or `Err` for failure. `when` expressions can pattern match on `Result` variants and bind their inner values. - -```baba -// Function returning a Result type -divide : x y -> - when y is - 0 then Err "Division by zero is not allowed." - _ then Ok (x / y); - -// Consuming a Result type with pattern matching -resultDivideOk : divide 10 2; // Result: Ok 5 -resultDivideErr : divide 5 0; // Result: Err "Division by zero is not allowed." - -// Extracting values from Result types using 'when' -finalResultOk : when resultDivideOk is - Ok val then val // 'val' binds to the inner value of Ok (5) - Err msg then 0; - -finalResultErr : when resultDivideErr is - Ok val then 0 - Err msg then msg; // 'msg' binds to the inner value of Err ("Division by zero...") +baba-yaga/ +├── src/ +│ ├── core/ # Current optimized implementation +│ ├── legacy/ # Original stable implementation +│ └── benchmarks/ # Performance testing +├── docs/ # Language documentation +├── tests/ # Test suite (226 tests) +├── build/ # Compiled binaries +├── scratch/ # Development files +│ ├── baba/ # Test .baba programs +│ ├── js/ # JavaScript utilities +│ └── docs/ # Technical documentation +├── dev/ # Editor support (VS Code, Vim, etc.) +├── web/ # Web playground +└── experimental/ # Experimental features ``` -## 5. Getting Started - -To run the language, you'll need [Bun](https://bun.sh/) installed. - -1. **Clone the repository:** - ```bash - git clone <repository-url> - cd scripts # or wherever your project root is - ``` -2. **Install dependencies (Bun will handle this):** - ```bash - bun install - ``` - -## 6. Running Examples +## Development -To run a `.baba` script: +### Building Binaries ```bash -bun run index.js <path/to/your/script.baba> -``` - -### Local Bindings: Header `with` (and `with rec`) - -Stage local bindings in a function header right after the arrow. Bindings are evaluated left-to-right in an inner scope. Locals can be typed like globals. +# Build for current platform +bun run build -Semantics -- `with ( ... ) -> body`: non-recursive locals - - entries: type decl `name Type;` and assignment `name : expr;` - - types are validated using the same runtime lattice (Int ⊂ Float ⊂ Number) - - shadowing allowed; forward refs among assignments are not -- `with rec ( ... ) -> body`: mutually recursive locals - - all names pre-bound, then assignments evaluated; each assignment must be a function value +# Build for all platforms +bun run build:all -Semicolons -- Inside `with ( ... )`: semicolons separate entries; trailing semicolon allowed -- Between header and body: `->` -- After body: same as other top-level statements (semicolon optional) - -Examples - -```baba -// untyped locals -addMul : x y -> with (inc : x + 1; prod : inc * y;) -> inc + prod; - -// typed locals (global-like style) -sumNext : (x: Int, y: Int) -> Int -> - with (nx Int; ny Int; nx : x + 1; ny : y + 1;) -> nx + ny; - -// with + when -classify : n -> - with (lo Int; hi Int; lo : 10; hi : 100;) -> - when n is - 0 then "zero"; - _ then when (n > hi) is - true then "large"; - _ then when (n > lo) is - true then "medium"; - _ then "small"; - -// mutual recursion -isEvenOdd : z -> with rec ( - isEven : n -> when n is 0 then true _ then isOdd (n - 1); - isOdd : n -> when n is 0 then false _ then isEven (n - 1); -) -> { e: isEven 10, o: isOdd 7 }; +# Build for specific platform +bun run build:linux +bun run build:windows +bun run build:macos-intel +bun run build:macos-arm ``` -For example, to run the provided comprehensive example: +### Running Tests ```bash -bun run index.js example.baba -``` - -To test recursive functions: +# All tests +bun test -```bash -bun run index.js test_recursion_fixed.baba +# Benchmark performance +bun run benchmark +bun run benchmark:full ``` -To enable debug logging during parsing (useful for development): +### CLI Options ```bash -bun run index.js example.baba --debug -``` - -## 7. Testing +bun run index.js program.baba [options] -The language includes a comprehensive test suite covering all features including recursive functions: +Options: + --debug Enable debug output + --profile Enable performance profiling + --strict Enable strict mode validation + --legacy Use legacy (stable) engine +``` -```bash -# Run all tests -bun test +## Engine Architecture -# Run specific test files -bun test tests/recursive_functions.test.js -bun test tests/data_structures.test.js -``` +### Current Status (v2.0.0) -## 8. Reference Card +- **Default Engine**: Legacy lexer/parser/interpreter (stable, reliable) +- **Optimized Engine**: Available but **disabled by default** due to [critical lexer bug](scratch/docs/LEXER_BUG_REPORT.md) +- **Performance**: Legacy engine handles all test cases correctly +- **Compatibility**: 226 tests pass, full language support -For a quick reference of all language features, syntax, and examples, see `ref.txt` - a comprehensive reference card modeled after the K language reference. +### Key Components -## 9. REPL +- **Lexer**: Tokenizes source code (legacy character-by-character parsing) +- **Parser**: Builds Abstract Syntax Tree (recursive descent parser) +- **Interpreter**: Executes AST with scope management and built-in functions +- **Error System**: Rich error messages with source location and suggestions +- **Configuration**: Flexible engine settings and feature flags -An interactive REPL is included. It supports multiline input and a few simple commands. +## Example Programs -Run the REPL (Node): +### Basic Syntax -```bash -npm run repl -``` +```baba +// Variables and functions +x : 42; +add : a b -> a + b; +result : add 10 5; + +// Lists and higher-order functions +numbers : [1, 2, 3, 4, 5]; +doubled : map (x -> x * 2) numbers; +sum : reduce (acc x -> acc + x) 0 doubled; + +// Array programming operations +evens : where (x -> (x % 2) = 0) numbers; // [1, 3] (indices) +evenValues : at evens numbers; // [2, 4] +cumulative : cumsum numbers; // [0, 1, 3, 6, 10, 15] +matrix : reshape [2, 3] (range 1 6); // [[1,2,3], [4,5,6]] + +// Pattern matching with guards +classify : n -> + when n is + 0 then "zero" + x if ((x > 0) and (x < 10)) then "small positive" + Int then when (n > 10) is true then "big" _ then "small" + _ then "unknown"; -Run the REPL (Bun): +// Error handling +safeDivide : a b -> + when b is + 0 then Err "Division by zero" + _ then Ok (a / b); -```bash -bun run repl.js +// Function composition and utilities +processData : xs -> + with ( + validated : filter (validate.range 1 100) xs; + grouped : chunk validated 3; + processed : map (chunk -> reduce (acc x -> acc + x) 0 chunk) grouped; + ) -> processed; ``` -Usage notes: -- Press Enter to add new lines to the current input buffer. -- Execute the buffer with `:run` or by entering a single `.` on its own line. -- You can also end the current buffer with a trailing `;;` to submit immediately (the REPL treats the second `;` as a submit marker). -- Commands: - - `:run` — execute current buffer - - `:reset` — clear buffer - - `:clear` — clear session state - - `:load <path>` — validate and load a `.baba` file into the session without executing - - On success: prints `Loaded <path>. Type :run to execute.`; the loaded program is executed when you run `:run` (even with an empty buffer) - - On error: prints a parse error and a code frame - - Path tips: quotes are accepted (e.g. `:load "my file.baba"`) and `~` expands to your home directory (e.g. `:load ~/prog.baba`) - - `:help` — show commands - - `:quit | :exit` — exit REPL -- Programs can call `io.in` to synchronously read a line of input during evaluation (you’ll see an `input>` prompt). +### Conway's Game of Life -## 10. Typed Functions and Errors +See [scratch/baba/life-final.baba](scratch/baba/life-final.baba) for a complete implementation. -Typed functions annotate parameter and return types. Validation happens at runtime when calling the function (for parameters) and after the body evaluates (for returns). +## Documentation -Examples: +- [Language Crash Course](docs/00_crash-course.md) - Complete language guide with all features +- [Functional Programming](docs/01_functional.md) - Higher-order functions, combinators, and array programming +- [Data Structures](docs/02_data-structures.md) - Lists, tables, and comprehensive array operations +- [Pattern Matching](docs/03_pattern-matching.md) - `when` expressions, guards, and advanced patterns +- [Type System](docs/04_types.md) - Optional typing with runtime validation +- [Recursion & Composition](docs/05_recursion-and-composition.md) - Function composition and mutual recursion +- [Error Handling](docs/06_error-handling.md) - `Result` types, validation, and debugging utilities +- [Syntax Gotchas](docs/07_gotchyas.md) - Common pitfalls and strict syntax requirements +- [Array Programming](docs/08_array-programming.md) - Comprehensive APL/K-inspired operations +- [JavaScript Interop](docs/09_js-interop.md) - Safe integration with JavaScript functions and APIs -```baba -// Parameter and return types -add : (x: Int, y: Int) -> Int -> x + y; -res : add 2 3; // 5 - -// Curried typed function -mul : (x: Float, y: Float) -> Float -> x * y; -mul2 : mul 2.0; -res2 : mul2 3.5; // 7.0 - -// Runtime errors (shape) -// Type mismatch in function 'add': Expected Int for parameter 'y', but got String (value: "3") -// Return type mismatch in function 'greet': Expected String, but got Int (value: 42) -``` +### Technical Documentation -See `docs/types.md` for a deeper dive. +- [Lexer Bug Report](scratch/docs/LEXER_BUG_REPORT.md) - Critical issue with optimized lexer +- [Reimplementation Guide](scratch/docs/REIMPLEMENTATION_GUIDE.md) - Porting to Rust/C++ +- [Build Instructions](scratch/docs/BUILD_README.md) - Static binary compilation +- [Cross-Compilation](scratch/docs/CROSS_COMPILATION_GUIDE.md) - Multi-platform builds -## Compiling to JavaScript (closure mode) +## Testing & Quality -The repository now includes a minimal Baba Yaga → JavaScript compiler that emits an ESM module with a small runtime. +- **Test Suite**: Over 200 tests covering all language features +- **Test Categories**: Parser, interpreter, functional enhancements, edge cases +- **Performance**: Benchmarking suite with optimization comparisons +- **Error Handling**: Rich error messages with source context and suggestions -- Compile a `.baba` file to JS: +## Known Issues - ```bash - node experimental/compiler/compiler.js --in path/to/program.baba -o out.js --format esm --mode closure - ``` - -- Run the compiled module (Node ESM): +1. **Optimized Lexer Bug** (Critical) - Regex-based lexer skips file content + - **Status**: Documented, reverted to legacy lexer by default + - **Impact**: No user impact (legacy lexer works perfectly) + - **Fix**: Future investigation needed - ```bash - node -e "import('file://$PWD/out.js').then(m => m.run({ io: { out: console.log, in: () => '' } }))" - ``` +## Conway's Game of Life -- Parity check (interpreter vs compiled): +Working implementation demonstrating: +- Cellular automaton rules +- Pattern evolution (blinker oscillator) +- Game state management ```bash - node parity.js path/to/program.baba - ``` - -- Host contract expected by compiled code: - - `host.io.out(...args)` is called with values (numbers as `{ value, isFloat }` objects) - - `host.io.in()` should return a string; it defaults to `''` - -Notes and current limitations: -- Codegen mode is closure-only right now (`--mode closure`). -- The runtime prelude is minimal but supports: numbers, arithmetic/compare/logic, `..` concat, `Ok/Err`, `get`, `cond`, lists (map/filter/reduce), and core `str.*` helpers. -- `when` is lowered to `cond` chains; literal and simple structural cases work. Full pattern coverage is in progress. \ No newline at end of file +bun run index.js scratch/baba/life-final.baba +``` |