diff options
Diffstat (limited to 'js/baba-yaga/scratch/docs/REIMPLEMENTATION_GUIDE.md')
-rw-r--r-- | js/baba-yaga/scratch/docs/REIMPLEMENTATION_GUIDE.md | 693 |
1 files changed, 693 insertions, 0 deletions
diff --git a/js/baba-yaga/scratch/docs/REIMPLEMENTATION_GUIDE.md b/js/baba-yaga/scratch/docs/REIMPLEMENTATION_GUIDE.md new file mode 100644 index 0000000..3e6f2e0 --- /dev/null +++ b/js/baba-yaga/scratch/docs/REIMPLEMENTATION_GUIDE.md @@ -0,0 +1,693 @@ +# Baba Yaga Reimplementation Guide + +This guide outlines how to reimplement the Baba Yaga functional language in a faster, compiled language. While the current JavaScript implementation serves as an excellent prototype, a native implementation could provide significant performance improvements and better integration capabilities. + +## Language Recommendation: Rust + +After analyzing the requirements, **Rust** emerges as the optimal choice because: + +- **Memory safety** without garbage collection overhead +- **Native pattern matching** that directly maps to Baba Yaga's `when` expressions +- **Functional programming support** for closures and higher-order functions +- **Built-in `Result<T, E>`** type matching Baba Yaga's error handling +- **Zero-cost abstractions** for performance +- **Excellent tooling** and growing ecosystem + +## Project Structure + +``` +baba-yaga-rust/ +├── Cargo.toml +├── src/ +│ ├── main.rs # CLI entry point +│ ├── lib.rs # Library exports +│ ├── lexer/ +│ │ ├── mod.rs # Lexer module +│ │ └── token.rs # Token definitions +│ ├── parser/ +│ │ ├── mod.rs # Parser module +│ │ └── ast.rs # AST node definitions +│ ├── interpreter/ +│ │ ├── mod.rs # Interpreter module +│ │ ├── value.rs # Runtime value types +│ │ ├── scope.rs # Scope management +│ │ └── builtins.rs # Built-in functions +│ ├── error.rs # Error types +│ └── repl.rs # REPL implementation +├── tests/ +│ ├── integration/ +│ └── fixtures/ +└── benches/ # Performance benchmarks +``` + +## Phase 1: Core Data Types and Error Handling + +### 1.1 Define Core Types + +**File: `src/error.rs`** +```rust +use std::fmt; + +#[derive(Debug, Clone)] +pub enum BabaError { + LexError(String), + ParseError(String), + RuntimeError(String), + TypeError(String), + UndefinedVariable(String), + UndefinedProperty(String), + DivisionByZero, + IndexOutOfBounds(usize), +} + +impl fmt::Display for BabaError { + fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { + match self { + BabaError::LexError(msg) => write!(f, "Lexer error: {}", msg), + BabaError::ParseError(msg) => write!(f, "Parse error: {}", msg), + BabaError::RuntimeError(msg) => write!(f, "Runtime error: {}", msg), + BabaError::TypeError(msg) => write!(f, "Type error: {}", msg), + BabaError::UndefinedVariable(name) => write!(f, "Undefined variable: {}", name), + BabaError::UndefinedProperty(prop) => write!(f, "Undefined property: {}", prop), + BabaError::DivisionByZero => write!(f, "Division by zero"), + BabaError::IndexOutOfBounds(idx) => write!(f, "Index out of bounds: {}", idx), + } + } +} + +impl std::error::Error for BabaError {} + +pub type Result<T> = std::result::Result<T, BabaError>; +``` + +### 1.2 Runtime Value System + +**File: `src/interpreter/value.rs`** +```rust +use std::collections::HashMap; +use std::rc::Rc; +use im::{Vector, HashMap as ImHashMap}; // Use persistent data structures + +#[derive(Debug, Clone)] +pub enum Value { + Number { value: f64, is_float: bool }, + String(String), + Boolean(bool), + List(Vector<Value>), + Table(ImHashMap<String, Value>), + Function(Function), + NativeFunction(NativeFn), + Result { variant: ResultVariant, value: Box<Value> }, + Unit, +} + +#[derive(Debug, Clone)] +pub enum ResultVariant { + Ok, + Err, +} + +#[derive(Debug, Clone)] +pub struct Function { + pub params: Vec<String>, + pub body: Rc<AstNode>, + pub closure: Scope, + pub return_type: Option<Type>, +} + +pub type NativeFn = fn(&[Value]) -> crate::Result<Value>; +``` + +## Phase 2: Lexical Analysis + +### 2.1 Token Definition + +**File: `src/lexer/token.rs`** +```rust +#[derive(Debug, Clone, PartialEq)] +pub enum TokenType { + // Literals + Number { value: f64, is_float: bool }, + String(String), + Identifier(String), + + // Keywords + When, Is, Then, With, Rec, Ok, Err, + True, False, Pi, Infinity, + And, Or, Xor, + + // Operators + Plus, Minus, Star, Slash, Percent, + Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual, + Concat, // .. + + // Punctuation + LeftParen, RightParen, + LeftBrace, RightBrace, + LeftBracket, RightBracket, + Colon, Semicolon, Comma, Dot, Arrow, + + // Special + Newline, + Eof, +} + +#[derive(Debug, Clone)] +pub struct Token { + pub token_type: TokenType, + pub line: usize, + pub column: usize, +} +``` + +### 2.2 Lexer Implementation + +**File: `src/lexer/mod.rs`** +Use a character-by-character state machine approach: + +```rust +pub struct Lexer { + input: Vec<char>, + position: usize, + line: usize, + column: usize, +} + +impl Lexer { + pub fn new(input: String) -> Self { + Self { + input: input.chars().collect(), + position: 0, + line: 1, + column: 1, + } + } + + pub fn tokenize(&mut self) -> crate::Result<Vec<Token>> { + let mut tokens = Vec::new(); + + while !self.is_at_end() { + self.skip_whitespace(); + if self.is_at_end() { break; } + + tokens.push(self.next_token()?); + } + + tokens.push(Token { + token_type: TokenType::Eof, + line: self.line, + column: self.column, + }); + + Ok(tokens) + } + + fn next_token(&mut self) -> crate::Result<Token> { + // Implementation details... + } +} +``` + +## Phase 3: Abstract Syntax Tree + +### 3.1 AST Node Definition + +**File: `src/parser/ast.rs`** +```rust +#[derive(Debug, Clone)] +pub enum AstNode { + // Literals + Number { value: f64, is_float: bool }, + String(String), + Boolean(bool), + List(Vec<AstNode>), + Table(Vec<(String, AstNode)>), + + // Identifiers and access + Identifier(String), + MemberAccess { object: Box<AstNode>, property: Box<AstNode> }, + + // Functions + Function { params: Vec<String>, body: Box<AstNode> }, + FunctionCall { callee: Box<AstNode>, args: Vec<AstNode> }, + + // Control flow + When { + discriminants: Vec<AstNode>, + cases: Vec<WhenCase>, + }, + + // Declarations + VariableDeclaration { name: String, value: Box<AstNode> }, + FunctionDeclaration { + name: String, + params: Vec<String>, + body: Box<AstNode>, + return_type: Option<Type>, + }, + + // Local bindings + WithHeader { + entries: Vec<WithEntry>, + body: Box<AstNode>, + recursive: bool, + }, + + // Expressions + BinaryOp { left: Box<AstNode>, op: BinaryOperator, right: Box<AstNode> }, + UnaryOp { op: UnaryOperator, operand: Box<AstNode> }, + + // Result types + Result { variant: ResultVariant, value: Box<AstNode> }, + + // Program structure + Program(Vec<AstNode>), +} + +#[derive(Debug, Clone)] +pub struct WhenCase { + pub patterns: Vec<Pattern>, + pub body: Box<AstNode>, +} + +#[derive(Debug, Clone)] +pub enum Pattern { + Literal(AstNode), + Wildcard, + Type(String), + Result { variant: ResultVariant, binding: String }, + List(Vec<Pattern>), + Table(Vec<(String, Pattern)>), +} +``` + +## Phase 4: Parser Implementation + +### 4.1 Recursive Descent Parser + +**File: `src/parser/mod.rs`** +```rust +pub struct Parser { + tokens: Vec<Token>, + current: usize, +} + +impl Parser { + pub fn new(tokens: Vec<Token>) -> Self { + Self { tokens, current: 0 } + } + + pub fn parse(&mut self) -> crate::Result<AstNode> { + let mut statements = Vec::new(); + + while !self.is_at_end() { + statements.push(self.statement()?); + } + + Ok(AstNode::Program(statements)) + } + + fn statement(&mut self) -> crate::Result<AstNode> { + match self.peek().token_type { + TokenType::Identifier(_) => { + if self.peek_ahead(1).token_type == TokenType::Colon { + self.declaration() + } else { + self.expression() + } + } + _ => self.expression(), + } + } + + // Implement precedence climbing for expressions + fn expression(&mut self) -> crate::Result<AstNode> { + self.expression_with_precedence(0) + } + + fn expression_with_precedence(&mut self, min_precedence: u8) -> crate::Result<AstNode> { + // Implementation using precedence climbing algorithm + } +} +``` + +## Phase 5: Interpreter Core + +### 5.1 Scope Management + +**File: `src/interpreter/scope.rs`** +```rust +use std::collections::HashMap; +use std::rc::Rc; +use crate::interpreter::value::Value; + +#[derive(Debug, Clone)] +pub struct Scope { + bindings: HashMap<String, Value>, + parent: Option<Rc<Scope>>, +} + +impl Scope { + pub fn new() -> Self { + Self { + bindings: HashMap::new(), + parent: None, + } + } + + pub fn with_parent(parent: Rc<Scope>) -> Self { + Self { + bindings: HashMap::new(), + parent: Some(parent), + } + } + + pub fn get(&self, name: &str) -> Option<Value> { + self.bindings.get(name).cloned() + .or_else(|| self.parent.as_ref().and_then(|p| p.get(name))) + } + + pub fn set(&mut self, name: String, value: Value) { + self.bindings.insert(name, value); + } +} +``` + +### 5.2 Interpreter Implementation + +**File: `src/interpreter/mod.rs`** +```rust +use std::rc::Rc; +use crate::parser::ast::AstNode; +use crate::interpreter::value::Value; +use crate::interpreter::scope::Scope; + +pub struct Interpreter { + global_scope: Rc<Scope>, +} + +impl Interpreter { + pub fn new() -> Self { + let mut global_scope = Scope::new(); + Self::register_builtins(&mut global_scope); + + Self { + global_scope: Rc::new(global_scope), + } + } + + pub fn eval(&self, ast: &AstNode) -> crate::Result<Value> { + self.eval_with_scope(ast, self.global_scope.clone()) + } + + fn eval_with_scope(&self, ast: &AstNode, scope: Rc<Scope>) -> crate::Result<Value> { + match ast { + AstNode::Number { value, is_float } => { + Ok(Value::Number { value: *value, is_float: *is_float }) + } + + AstNode::String(s) => Ok(Value::String(s.clone())), + + AstNode::Boolean(b) => Ok(Value::Boolean(*b)), + + AstNode::Identifier(name) => { + scope.get(name) + .ok_or_else(|| BabaError::UndefinedVariable(name.clone())) + } + + AstNode::When { discriminants, cases } => { + self.eval_when(discriminants, cases, scope) + } + + AstNode::FunctionCall { callee, args } => { + self.eval_function_call(callee, args, scope) + } + + // ... other cases + _ => todo!("Implement remaining AST node evaluation"), + } + } +} +``` + +## Phase 6: Built-in Functions + +### 6.1 Built-in Registry + +**File: `src/interpreter/builtins.rs`** +```rust +use crate::interpreter::value::{Value, NativeFn}; +use crate::interpreter::scope::Scope; +use im::Vector; + +impl Interpreter { + fn register_builtins(scope: &mut Scope) { + // Math functions + scope.set("math".to_string(), create_math_namespace()); + + // String functions + scope.set("str".to_string(), create_str_namespace()); + + // List functions + scope.set("map".to_string(), Value::NativeFunction(builtin_map)); + scope.set("filter".to_string(), Value::NativeFunction(builtin_filter)); + scope.set("reduce".to_string(), Value::NativeFunction(builtin_reduce)); + scope.set("append".to_string(), Value::NativeFunction(builtin_append)); + + // IO functions + scope.set("io".to_string(), create_io_namespace()); + } +} + +fn builtin_map(args: &[Value]) -> crate::Result<Value> { + if args.len() != 2 { + return Err(BabaError::RuntimeError("map expects 2 arguments".to_string())); + } + + let func = &args[0]; + let list = &args[1]; + + match (func, list) { + (Value::Function(f), Value::List(items)) => { + let mut result = Vector::new(); + for item in items { + // Apply function to each item + let mapped = apply_function(f, &[item.clone()])?; + result.push_back(mapped); + } + Ok(Value::List(result)) + } + _ => Err(BabaError::TypeError("Invalid arguments to map".to_string())), + } +} +``` + +## Phase 7: Performance Optimizations + +### 7.1 Benchmark Setup + +**File: `benches/interpreter.rs`** +```rust +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use baba_yaga_rust::*; + +fn benchmark_fibonacci(c: &mut Criterion) { + let code = r#" + fibonacci : n -> + when n is + 0 then 0 + 1 then 1 + _ then (fibonacci (n - 1)) + (fibonacci (n - 2)); + result : fibonacci 10; + "#; + + c.bench_function("fibonacci", |b| { + b.iter(|| { + let mut lexer = Lexer::new(black_box(code.to_string())); + let tokens = lexer.tokenize().unwrap(); + let mut parser = Parser::new(tokens); + let ast = parser.parse().unwrap(); + let interpreter = Interpreter::new(); + interpreter.eval(&ast).unwrap() + }) + }); +} + +criterion_group!(benches, benchmark_fibonacci); +criterion_main!(benches); +``` + +### 7.2 Optimization Strategies + +1. **AST Interning**: Use `Rc<AstNode>` to avoid cloning large AST subtrees +2. **Value Interning**: Intern common strings and small numbers +3. **Scope Optimization**: Use arena allocation for scopes +4. **Tail Call Detection**: Identify tail-recursive patterns for optimization +5. **Constant Folding**: Evaluate constant expressions at parse time + +## Phase 8: Integration and CLI + +### 8.1 Command Line Interface + +**File: `src/main.rs`** +```rust +use clap::{App, Arg}; +use std::fs; +use baba_yaga_rust::*; + +fn main() -> Result<(), Box<dyn std::error::Error>> { + let matches = App::new("Baba Yaga") + .version("2.0.0") + .about("A functional scripting language") + .arg(Arg::with_name("file") + .help("The input file to execute") + .required(false) + .index(1)) + .arg(Arg::with_name("debug") + .short("d") + .long("debug") + .help("Enable debug output")) + .get_matches(); + + if let Some(filename) = matches.value_of("file") { + let code = fs::read_to_string(filename)?; + execute_code(&code)?; + } else { + start_repl()?; + } + + Ok(()) +} + +fn execute_code(code: &str) -> crate::Result<()> { + let mut lexer = Lexer::new(code.to_string()); + let tokens = lexer.tokenize()?; + let mut parser = Parser::new(tokens); + let ast = parser.parse()?; + let interpreter = Interpreter::new(); + let result = interpreter.eval(&ast)?; + + if !matches!(result, Value::Unit) { + println!("{:?}", result); + } + + Ok(()) +} +``` + +### 8.2 REPL Implementation + +**File: `src/repl.rs`** +```rust +use rustyline::{Editor, Result as RLResult}; +use crate::*; + +pub fn start_repl() -> crate::Result<()> { + let mut rl = Editor::<()>::new(); + let interpreter = Interpreter::new(); + + println!("Baba Yaga REPL v2.0.0"); + println!("Type :help for commands, :quit to exit"); + + loop { + match rl.readline("baba> ") { + Ok(line) => { + rl.add_history_entry(line.as_str()); + + if line.starts_with(':') { + handle_repl_command(&line)?; + } else { + match execute_line(&interpreter, &line) { + Ok(value) => { + if !matches!(value, Value::Unit) { + println!("{:?}", value); + } + } + Err(e) => eprintln!("Error: {}", e), + } + } + } + Err(_) => break, + } + } + + Ok(()) +} +``` + +## Phase 9: Testing Strategy + +### 9.1 Unit Tests +- Test each component in isolation +- Property-based testing for parser/lexer +- Comprehensive built-in function tests + +### 9.2 Integration Tests +- Port existing JavaScript test cases +- Performance regression tests +- Memory usage tests + +### 9.3 Compatibility Tests +- Ensure identical behavior to JavaScript version +- Cross-platform compatibility +- Host integration tests + +## Phase 10: Deployment and Distribution + +### 10.1 Build Configuration + +**File: `Cargo.toml`** +```toml +[package] +name = "baba-yaga-rust" +version = "2.0.0" +edition = "2021" + +[dependencies] +im = "15.1" # Persistent data structures +clap = "3.0" # CLI parsing +rustyline = "9.0" # REPL readline +criterion = "0.4" # Benchmarking + +[profile.release] +opt-level = 3 +lto = true +codegen-units = 1 +panic = "abort" + +[[bin]] +name = "baba" +path = "src/main.rs" + +[[bench]] +name = "interpreter" +harness = false +``` + +### 10.2 Cross-Compilation Targets +- Linux x86_64 +- macOS (Intel + Apple Silicon) +- Windows x86_64 +- WebAssembly (for browser embedding) + +## Expected Performance Improvements + +Based on typical JavaScript to Rust ports: + +- **Startup time**: 10-50x faster (no JIT warmup) +- **Execution speed**: 2-10x faster for compute-heavy workloads +- **Memory usage**: 2-5x less memory consumption +- **Binary size**: Much smaller self-contained executable +- **Predictable performance**: No garbage collection pauses + +## Migration Path + +1. **Phase 1-3**: Core infrastructure (2-3 weeks) +2. **Phase 4-5**: Parser and basic interpreter (2-3 weeks) +3. **Phase 6**: Built-in functions (1-2 weeks) +4. **Phase 7-8**: Optimization and CLI (1-2 weeks) +5. **Phase 9-10**: Testing and deployment (1-2 weeks) + +**Total estimated time**: 7-12 weeks for a complete reimplementation + +This approach provides a systematic path to a high-performance native Baba Yaga implementation while maintaining full compatibility with the existing JavaScript version. |