# Baba Yaga Reimplementation Guide This guide outlines how to reimplement the Baba Yaga functional language in a faster, compiled language. While the current JavaScript implementation serves as an excellent prototype, a native implementation could provide significant performance improvements and better integration capabilities. ## Language Recommendation: Rust After analyzing the requirements, **Rust** emerges as the optimal choice because: - **Memory safety** without garbage collection overhead - **Native pattern matching** that directly maps to Baba Yaga's `when` expressions - **Functional programming support** for closures and higher-order functions - **Built-in `Result`** type matching Baba Yaga's error handling - **Zero-cost abstractions** for performance - **Excellent tooling** and growing ecosystem ## Project Structure ``` baba-yaga-rust/ ├── Cargo.toml ├── src/ │ ├── main.rs # CLI entry point │ ├── lib.rs # Library exports │ ├── lexer/ │ │ ├── mod.rs # Lexer module │ │ └── token.rs # Token definitions │ ├── parser/ │ │ ├── mod.rs # Parser module │ │ └── ast.rs # AST node definitions │ ├── interpreter/ │ │ ├── mod.rs # Interpreter module │ │ ├── value.rs # Runtime value types │ │ ├── scope.rs # Scope management │ │ └── builtins.rs # Built-in functions │ ├── error.rs # Error types │ └── repl.rs # REPL implementation ├── tests/ │ ├── integration/ │ └── fixtures/ └── benches/ # Performance benchmarks ``` ## Phase 1: Core Data Types and Error Handling ### 1.1 Define Core Types **File: `src/error.rs`** ```rust use std::fmt; #[derive(Debug, Clone)] pub enum BabaError { LexError(String), ParseError(String), RuntimeError(String), TypeError(String), UndefinedVariable(String), UndefinedProperty(String), DivisionByZero, IndexOutOfBounds(usize), } impl fmt::Display for BabaError { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { match self { BabaError::LexError(msg) => write!(f, "Lexer error: {}", msg), BabaError::ParseError(msg) => write!(f, "Parse error: {}", msg), BabaError::RuntimeError(msg) => write!(f, "Runtime error: {}", msg), BabaError::TypeError(msg) => write!(f, "Type error: {}", msg), BabaError::UndefinedVariable(name) => write!(f, "Undefined variable: {}", name), BabaError::UndefinedProperty(prop) => write!(f, "Undefined property: {}", prop), BabaError::DivisionByZero => write!(f, "Division by zero"), BabaError::IndexOutOfBounds(idx) => write!(f, "Index out of bounds: {}", idx), } } } impl std::error::Error for BabaError {} pub type Result = std::result::Result; ``` ### 1.2 Runtime Value System **File: `src/interpreter/value.rs`** ```rust use std::collections::HashMap; use std::rc::Rc; use im::{Vector, HashMap as ImHashMap}; // Use persistent data structures #[derive(Debug, Clone)] pub enum Value { Number { value: f64, is_float: bool }, String(String), Boolean(bool), List(Vector), Table(ImHashMap), Function(Function), NativeFunction(NativeFn), Result { variant: ResultVariant, value: Box }, Unit, } #[derive(Debug, Clone)] pub enum ResultVariant { Ok, Err, } #[derive(Debug, Clone)] pub struct Function { pub params: Vec, pub body: Rc, pub closure: Scope, pub return_type: Option, } pub type NativeFn = fn(&[Value]) -> crate::Result; ``` ## Phase 2: Lexical Analysis ### 2.1 Token Definition **File: `src/lexer/token.rs`** ```rust #[derive(Debug, Clone, PartialEq)] pub enum TokenType { // Literals Number { value: f64, is_float: bool }, String(String), Identifier(String), // Keywords When, Is, Then, With, Rec, Ok, Err, True, False, Pi, Infinity, And, Or, Xor, // Operators Plus, Minus, Star, Slash, Percent, Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual, Concat, // .. // Punctuation LeftParen, RightParen, LeftBrace, RightBrace, LeftBracket, RightBracket, Colon, Semicolon, Comma, Dot, Arrow, // Special Newline, Eof, } #[derive(Debug, Clone)] pub struct Token { pub token_type: TokenType, pub line: usize, pub column: usize, } ``` ### 2.2 Lexer Implementation **File: `src/lexer/mod.rs`** Use a character-by-character state machine approach: ```rust pub struct Lexer { input: Vec, position: usize, line: usize, column: usize, } impl Lexer { pub fn new(input: String) -> Self { Self { input: input.chars().collect(), position: 0, line: 1, column: 1, } } pub fn tokenize(&mut self) -> crate::Result> { let mut tokens = Vec::new(); while !self.is_at_end() { self.skip_whitespace(); if self.is_at_end() { break; } tokens.push(self.next_token()?); } tokens.push(Token { token_type: TokenType::Eof, line: self.line, column: self.column, }); Ok(tokens) } fn next_token(&mut self) -> crate::Result { // Implementation details... } } ``` ## Phase 3: Abstract Syntax Tree ### 3.1 AST Node Definition **File: `src/parser/ast.rs`** ```rust #[derive(Debug, Clone)] pub enum AstNode { // Literals Number { value: f64, is_float: bool }, String(String), Boolean(bool), List(Vec), Table(Vec<(String, AstNode)>), // Identifiers and access Identifier(String), MemberAccess { object: Box, property: Box }, // Functions Function { params: Vec, body: Box }, FunctionCall { callee: Box, args: Vec }, // Control flow When { discriminants: Vec, cases: Vec, }, // Declarations VariableDeclaration { name: String, value: Box }, FunctionDeclaration { name: String, params: Vec, body: Box, return_type: Option, }, // Local bindings WithHeader { entries: Vec, body: Box, recursive: bool, }, // Expressions BinaryOp { left: Box, op: BinaryOperator, right: Box }, UnaryOp { op: UnaryOperator, operand: Box }, // Result types Result { variant: ResultVariant, value: Box }, // Program structure Program(Vec), } #[derive(Debug, Clone)] pub struct WhenCase { pub patterns: Vec, pub body: Box, } #[derive(Debug, Clone)] pub enum Pattern { Literal(AstNode), Wildcard, Type(String), Result { variant: ResultVariant, binding: String }, List(Vec), Table(Vec<(String, Pattern)>), } ``` ## Phase 4: Parser Implementation ### 4.1 Recursive Descent Parser **File: `src/parser/mod.rs`** ```rust pub struct Parser { tokens: Vec, current: usize, } impl Parser { pub fn new(tokens: Vec) -> Self { Self { tokens, current: 0 } } pub fn parse(&mut self) -> crate::Result { let mut statements = Vec::new(); while !self.is_at_end() { statements.push(self.statement()?); } Ok(AstNode::Program(statements)) } fn statement(&mut self) -> crate::Result { match self.peek().token_type { TokenType::Identifier(_) => { if self.peek_ahead(1).token_type == TokenType::Colon { self.declaration() } else { self.expression() } } _ => self.expression(), } } // Implement precedence climbing for expressions fn expression(&mut self) -> crate::Result { self.expression_with_precedence(0) } fn expression_with_precedence(&mut self, min_precedence: u8) -> crate::Result { // Implementation using precedence climbing algorithm } } ``` ## Phase 5: Interpreter Core ### 5.1 Scope Management **File: `src/interpreter/scope.rs`** ```rust use std::collections::HashMap; use std::rc::Rc; use crate::interpreter::value::Value; #[derive(Debug, Clone)] pub struct Scope { bindings: HashMap, parent: Option>, } impl Scope { pub fn new() -> Self { Self { bindings: HashMap::new(), parent: None, } } pub fn with_parent(parent: Rc) -> Self { Self { bindings: HashMap::new(), parent: Some(parent), } } pub fn get(&self, name: &str) -> Option { self.bindings.get(name).cloned() .or_else(|| self.parent.as_ref().and_then(|p| p.get(name))) } pub fn set(&mut self, name: String, value: Value) { self.bindings.insert(name, value); } } ``` ### 5.2 Interpreter Implementation **File: `src/interpreter/mod.rs`** ```rust use std::rc::Rc; use crate::parser::ast::AstNode; use crate::interpreter::value::Value; use crate::interpreter::scope::Scope; pub struct Interpreter { global_scope: Rc, } impl Interpreter { pub fn new() -> Self { let mut global_scope = Scope::new(); Self::register_builtins(&mut global_scope); Self { global_scope: Rc::new(global_scope), } } pub fn eval(&self, ast: &AstNode) -> crate::Result { self.eval_with_scope(ast, self.global_scope.clone()) } fn eval_with_scope(&self, ast: &AstNode, scope: Rc) -> crate::Result { match ast { AstNode::Number { value, is_float } => { Ok(Value::Number { value: *value, is_float: *is_float }) } AstNode::String(s) => Ok(Value::String(s.clone())), AstNode::Boolean(b) => Ok(Value::Boolean(*b)), AstNode::Identifier(name) => { scope.get(name) .ok_or_else(|| BabaError::UndefinedVariable(name.clone())) } AstNode::When { discriminants, cases } => { self.eval_when(discriminants, cases, scope) } AstNode::FunctionCall { callee, args } => { self.eval_function_call(callee, args, scope) } // ... other cases _ => todo!("Implement remaining AST node evaluation"), } } } ``` ## Phase 6: Built-in Functions ### 6.1 Built-in Registry **File: `src/interpreter/builtins.rs`** ```rust use crate::interpreter::value::{Value, NativeFn}; use crate::interpreter::scope::Scope; use im::Vector; impl Interpreter { fn register_builtins(scope: &mut Scope) { // Math functions scope.set("math".to_string(), create_math_namespace()); // String functions scope.set("str".to_string(), create_str_namespace()); // List functions scope.set("map".to_string(), Value::NativeFunction(builtin_map)); scope.set("filter".to_string(), Value::NativeFunction(builtin_filter)); scope.set("reduce".to_string(), Value::NativeFunction(builtin_reduce)); scope.set("append".to_string(), Value::NativeFunction(builtin_append)); // IO functions scope.set("io".to_string(), create_io_namespace()); } } fn builtin_map(args: &[Value]) -> crate::Result { if args.len() != 2 { return Err(BabaError::RuntimeError("map expects 2 arguments".to_string())); } let func = &args[0]; let list = &args[1]; match (func, list) { (Value::Function(f), Value::List(items)) => { let mut result = Vector::new(); for item in items { // Apply function to each item let mapped = apply_function(f, &[item.clone()])?; result.push_back(mapped); } Ok(Value::List(result)) } _ => Err(BabaError::TypeError("Invalid arguments to map".to_string())), } } ``` ## Phase 7: Performance Optimizations ### 7.1 Benchmark Setup **File: `benches/interpreter.rs`** ```rust use criterion::{black_box, criterion_group, criterion_main, Criterion}; use baba_yaga_rust::*; fn benchmark_fibonacci(c: &mut Criterion) { let code = r#" fibonacci : n -> when n is 0 then 0 1 then 1 _ then (fibonacci (n - 1)) + (fibonacci (n - 2)); result : fibonacci 10; "#; c.bench_function("fibonacci", |b| { b.iter(|| { let mut lexer = Lexer::new(black_box(code.to_string())); let tokens = lexer.tokenize().unwrap(); let mut parser = Parser::new(tokens); let ast = parser.parse().unwrap(); let interpreter = Interpreter::new(); interpreter.eval(&ast).unwrap() }) }); } criterion_group!(benches, benchmark_fibonacci); criterion_main!(benches); ``` ### 7.2 Optimization Strategies 1. **AST Interning**: Use `Rc` to avoid cloning large AST subtrees 2. **Value Interning**: Intern common strings and small numbers 3. **Scope Optimization**: Use arena allocation for scopes 4. **Tail Call Detection**: Identify tail-recursive patterns for optimization 5. **Constant Folding**: Evaluate constant expressions at parse time ## Phase 8: Integration and CLI ### 8.1 Command Line Interface **File: `src/main.rs`** ```rust use clap::{App, Arg}; use std::fs; use baba_yaga_rust::*; fn main() -> Result<(), Box> { let matches = App::new("Baba Yaga") .version("2.0.0") .about("A functional scripting language") .arg(Arg::with_name("file") .help("The input file to execute") .required(false) .index(1)) .arg(Arg::with_name("debug") .short("d") .long("debug") .help("Enable debug output")) .get_matches(); if let Some(filename) = matches.value_of("file") { let code = fs::read_to_string(filename)?; execute_code(&code)?; } else { start_repl()?; } Ok(()) } fn execute_code(code: &str) -> crate::Result<()> { let mut lexer = Lexer::new(code.to_string()); let tokens = lexer.tokenize()?; let mut parser = Parser::new(tokens); let ast = parser.parse()?; let interpreter = Interpreter::new(); let result = interpreter.eval(&ast)?; if !matches!(result, Value::Unit) { println!("{:?}", result); } Ok(()) } ``` ### 8.2 REPL Implementation **File: `src/repl.rs`** ```rust use rustyline::{Editor, Result as RLResult}; use crate::*; pub fn start_repl() -> crate::Result<()> { let mut rl = Editor::<()>::new(); let interpreter = Interpreter::new(); println!("Baba Yaga REPL v2.0.0"); println!("Type :help for commands, :quit to exit"); loop { match rl.readline("baba> ") { Ok(line) => { rl.add_history_entry(line.as_str()); if line.starts_with(':') { handle_repl_command(&line)?; } else { match execute_line(&interpreter, &line) { Ok(value) => { if !matches!(value, Value::Unit) { println!("{:?}", value); } } Err(e) => eprintln!("Error: {}", e), } } } Err(_) => break, } } Ok(()) } ``` ## Phase 9: Testing Strategy ### 9.1 Unit Tests - Test each component in isolation - Property-based testing for parser/lexer - Comprehensive built-in function tests ### 9.2 Integration Tests - Port existing JavaScript test cases - Performance regression tests - Memory usage tests ### 9.3 Compatibility Tests - Ensure identical behavior to JavaScript version - Cross-platform compatibility - Host integration tests ## Phase 10: Deployment and Distribution ### 10.1 Build Configuration **File: `Cargo.toml`** ```toml [package] name = "baba-yaga-rust" version = "2.0.0" edition = "2021" [dependencies] im = "15.1" # Persistent data structures clap = "3.0" # CLI parsing rustyline = "9.0" # REPL readline criterion = "0.4" # Benchmarking [profile.release] opt-level = 3 lto = true codegen-units = 1 panic = "abort" [[bin]] name = "baba" path = "src/main.rs" [[bench]] name = "interpreter" harness = false ``` ### 10.2 Cross-Compilation Targets - Linux x86_64 - macOS (Intel + Apple Silicon) - Windows x86_64 - WebAssembly (for browser embedding) ## Expected Performance Improvements Based on typical JavaScript to Rust ports: - **Startup time**: 10-50x faster (no JIT warmup) - **Execution speed**: 2-10x faster for compute-heavy workloads - **Memory usage**: 2-5x less memory consumption - **Binary size**: Much smaller self-contained executable - **Predictable performance**: No garbage collection pauses ## Migration Path 1. **Phase 1-3**: Core infrastructure (2-3 weeks) 2. **Phase 4-5**: Parser and basic interpreter (2-3 weeks) 3. **Phase 6**: Built-in functions (1-2 weeks) 4. **Phase 7-8**: Optimization and CLI (1-2 weeks) 5. **Phase 9-10**: Testing and deployment (1-2 weeks) **Total estimated time**: 7-12 weeks for a complete reimplementation This approach provides a systematic path to a high-performance native Baba Yaga implementation while maintaining full compatibility with the existing JavaScript version.