diff options
-rw-r--r-- | awk/scheme/scheme/TODO.txt | 95 | ||||
-rw-r--r-- | awk/scheme/scheme/scratch/arch-notes.md | 115 | ||||
-rw-r--r-- | awk/scheme/scheme/scratch/random.txt | 112 |
3 files changed, 275 insertions, 47 deletions
diff --git a/awk/scheme/scheme/TODO.txt b/awk/scheme/scheme/TODO.txt index 31723a4..f576fba 100644 --- a/awk/scheme/scheme/TODO.txt +++ b/awk/scheme/scheme/TODO.txt @@ -5,65 +5,66 @@ Current State: ------------- - Basic arithmetic operations (+,-,*,/) are working - Let expressions with simple bindings are working -- Function definitions (define) and lambda expressions are partially implemented +- Function definitions (define) and lambda expressions are implemented (no nested lambdas) - Stack-based virtual machine with environment for variable bindings +- Closures supported for top-level lambdas (not nested) +- State persistence for globals and functions between REPL sessions Recent Changes: -------------- -1. Fixed function table array naming conflicts (func_name -> func_def_names) -2. Modified vm_get_value to handle function names correctly -3. Attempted to fix argument handling in function calls -4. Modified lambda compilation to avoid pushing function name twice +1. Improved function table and environment handling +2. Enhanced debug logging for function calls and environment +3. Refined lambda compilation and closure creation +4. Updated documentation and architectural notes -Current Issues: --------------- -1. Function Definitions (define): - - Function calls are failing with "ADD requires numeric operands" - - Arguments may not be properly passed to functions - - Function environment may not be properly set up - -2. Lambda Expressions: - - Direct lambda calls are failing - - Lambda bindings in let expressions are failing - - Function environment for lambda parameters may not be correct +Current Issues / Limitations: +---------------------------- +1. **Nested Lambda Support**: + - Lambdas defined inside other lambdas are not supported ("Undefined closure function" errors) +2. **Reference Counting**: + - Reference counting for heap objects (cons cells) is stubbed but not enforced +3. **Error Handling**: + - Minimal error recovery, especially in the REPL +4. **Tail Recursion**: + - No proper tail call optimization +5. **Data Types**: + - No floating point, string, or full numeric tower support +6. **Garbage Collection**: + - No GC; memory for cons cells is not reclaimed +7. **Standard Library**: + - No standard library or macros Next Steps: ----------- -1. Debug Function Calls: - - Add detailed debug logging in vm_call_function - - Verify argument handling and environment setup - - Check if function code is being properly stored and retrieved - -2. Fix Function Environment: - - Review how parameters are stored in the environment - - Ensure environment is properly restored after function calls - - Verify parameter values are correctly passed to functions - -3. Improve Lambda Support: - - Review lambda compilation process - - Ensure lambda functions are properly stored and called - - Fix environment handling for lambda parameters - -4. Testing Strategy: - - Create test cases for each type of function call - - Add debug output to track stack and environment state - - Verify each step of function compilation and execution - -5. Code Cleanup: - - Review and document function handling code - - Ensure consistent naming and error handling - - Add comments explaining complex operations +1. **Implement Nested Lambda Support**: + - Refactor closure environment capture and restoration + - Update compiler and VM to support nested closures +2. **Improve Memory Management**: + - Implement or enforce reference counting + - Plan for future garbage collection +3. **Enhance Error Handling**: + - Add better error messages and recovery in REPL and VM +4. **Tail Call Optimization**: + - Investigate and implement proper tail recursion +5. **Testing and Debugging**: + - Expand test suite for edge cases (especially closures and let/lambda) + - Continue using DEBUG=1 for tracing execution +6. **Documentation and Code Quality**: + - Keep architectural notes and code comments up to date + - Document any new patterns or changes Technical Notes: --------------- -- Function calls use a combination of LOOKUP, GET_VALUE, and CALL instructions -- Environment stack handles variable bindings and function parameters -- Function code is stored in FUNCTIONS table with unique names -- Lambda functions use __lambda_N naming scheme +- Function calls use LOOKUP, GET_VALUE, and CALL instructions +- Environment stack manages variable bindings and function parameters +- Function code is stored in the FUNCTIONS table with unique names +- Lambda functions use __lambda_N naming scheme; closures capture environment at creation +- State is persisted to /tmp/scheme_vm.state and /tmp/scheme_vm.env Debugging Tips: -------------- 1. Enable DEBUG=1 to see detailed execution logs -2. Check stack contents before and after each operation -3. Verify environment state during function calls -4. Monitor function code storage and retrieval \ No newline at end of file +2. Check stack and environment contents before and after each operation +3. Monitor closure creation and environment capture/restoration +4. Verify function code storage and retrieval in the FUNCTIONS table +5. Use and expand test cases for all supported features \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/arch-notes.md b/awk/scheme/scheme/scratch/arch-notes.md new file mode 100644 index 0000000..060ca72 --- /dev/null +++ b/awk/scheme/scheme/scratch/arch-notes.md @@ -0,0 +1,115 @@ +# Awk-Scheme: Architectural Notes + +## Overview +Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a compiler and a stack-based virtual machine (VM). The architecture is modular, with clear separation of concerns between parsing/compilation and execution. The design leverages several classic architectural patterns to achieve extensibility, maintainability, and clarity. + +--- + +## 1. Program Flow: High-Level Pipeline + +1. **Input**: User provides Scheme code (via REPL or file). +2. **Compilation**: The compiler (`bin/compiler.awk`) parses and compiles Scheme code into VM instructions. +3. **Execution**: The VM (`bin/vm.awk`) executes the instructions, managing state, environment, and memory. + +--- + +## 2. Compiler (`bin/compiler.awk`) + +### 2.1. Lexical Analysis (Lexer) +- **Pattern**: *Lexer/Parser Separation* (classic compiler front-end) +- **Why**: Decouples tokenization from parsing, making the code easier to reason about and extend. +- **How**: The compiler tokenizes input into numbers, symbols, and parentheses, handling whitespace and comments. + +### 2.2. Parsing (Recursive Descent) +- **Pattern**: *Recursive Descent Parser* +- **Why**: Simple, direct mapping from grammar to code; easy to debug and extend for a small language. +- **How**: The parser builds an expression tree from tokens, handling nested expressions and validating syntax. + +### 2.3. Code Generation +- **Pattern**: *Visitor/Dispatcher* (for expression types) +- **Why**: Each expression type (number, variable, list, special form) is handled by a dedicated function, improving maintainability. +- **How**: The compiler emits stack-based VM instructions for each expression, handling special forms (define, let, lambda) and function calls. + +### 2.4. Closure Support +- **Pattern**: *Closure Conversion* (from functional programming) +- **Why**: Enables lexical scoping and first-class functions by capturing the environment at lambda creation. +- **How**: The compiler emits instructions to capture the current environment and create closure objects. + +--- + +## 3. Virtual Machine (`bin/vm.awk`) + +### 3.1. Stack-Based Execution +- **Pattern**: *Stack Machine* (classic VM design) +- **Why**: Simplicity and direct mapping from compiled instructions to execution; easy to implement in AWK. +- **How**: The VM maintains an evaluation stack for operands and results, executing instructions sequentially. + +### 3.2. Typed Value System +- **Pattern**: *Tagged Union* (Algebraic Data Type) +- **Why**: Enables runtime type checking and safe operations on heterogeneous values. +- **How**: All values are tagged (e.g., `N:`, `B:`, `P:`, `F:`, `CLOSURE:`) and checked at runtime. + +### 3.3. Environment Model +- **Pattern**: *Environment Chain* (Lexical Scoping) +- **Why**: Supports variable bindings, lexical scope, and closures. +- **How**: The VM maintains an environment stack for variable bindings, with special handling for global and closure environments. + +### 3.4. Function and Closure Handling +- **Pattern**: *Direct Function Execution* (no program array modification) +- **Why**: Simplifies call/return logic and avoids mutation of the instruction stream. +- **How**: Functions are stored as code blocks; calls execute code directly, with environment management for parameters and closures. + +### 3.5. Heap and Memory Management +- **Pattern**: *Manual Heap with Reference Counting (partial)* +- **Why**: Enables cons cell allocation and basic memory management. +- **How**: The VM allocates cons cells on a heap array, with a placeholder for reference counting (not fully implemented). + +### 3.6. State Persistence +- **Pattern**: *State Serialization* +- **Why**: Allows global state and functions to persist across REPL sessions. +- **How**: The VM serializes global variables and function definitions to files, loading them on startup. + +--- + +## 4. Extensibility and Maintainability +- **Pattern**: *Separation of Concerns* +- **Why**: Compiler and VM are independent, making it easy to extend the language or change the execution model. +- **Pattern**: *Table-Driven Dispatch* (for built-in functions) +- **Why**: Adding new primitives or special forms is straightforward. + +--- + +## 5. Notable Limitations +- No support for nested lambdas (yet), proper tail recursion, or garbage collection. +- Reference counting is stubbed but not enforced. +- Error handling is minimal. + +--- + +## 6. Summary Table: Patterns Used + +| Area | Pattern(s) Used | Why? | +|---------------------|----------------------------------|--------------------------------------| +| Lexing/Parsing | Lexer/Parser, Recursive Descent | Simplicity, extensibility | +| Code Generation | Visitor/Dispatcher | Maintainability, clarity | +| VM Execution | Stack Machine, Tagged Union | Simplicity, type safety | +| Environment | Environment Chain, Closure Conv. | Lexical scoping, closures | +| Function Dispatch | Table-Driven Dispatch | Extensibility | +| State Persistence | State Serialization | REPL continuity | +| Memory Management | Manual Heap, Ref Counting (stub) | List support, future GC | + +--- + +## 7. Architectural Choices: Rationale +- **AWK as Implementation Language**: Chosen for portability and as a challenge; influences the use of arrays and string-based data structures. +- **Stack Machine**: Maps well to AWK's capabilities and keeps the VM simple. +- **Separation of Compiler/VM**: Enables clear boundaries and easier debugging. +- **Explicit Typing**: Reduces runtime errors and clarifies value handling. + +--- + +## 8. Flow Diagram (Textual) + +``` +User Input (Scheme) → [Compiler] → VM Instructions → [VM] → Result/State +``` \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/random.txt b/awk/scheme/scheme/scratch/random.txt new file mode 100644 index 0000000..20b3bcf --- /dev/null +++ b/awk/scheme/scheme/scratch/random.txt @@ -0,0 +1,112 @@ +Other Uses for the Awk-Scheme VM +================================ + +The stack-based VM in Awk-Scheme is surprisingly versatile. Here are some realistic alternatives that could be implemented using the same VM architecture: + +## 1. Forth Implementation +**Feasibility: High** + +Forth is a natural fit since it's already stack-based: +- **Stack Operations**: VM already has PUSH_CONST, POP, DUP, SWAP +- **Arithmetic**: ADD, SUB, MUL, DIV are perfect for Forth +- **Memory**: Could extend heap for Forth's memory model +- **Control Flow**: Would need to add conditional jumps and loops +- **Words**: Could map Forth words to VM functions + +**Required Extensions:** +- JMP, JZ (jump if zero), JNZ instructions +- More stack operations (OVER, ROT, etc.) +- Memory read/write operations +- Input/output operations + +## 2. Simple Calculator Language +**Feasibility: Very High** + +A basic calculator with variables and functions: +- **Syntax**: `x = 5; y = x + 3; print y` +- **Features**: Variables, basic math, simple functions +- **VM Fit**: Perfect - already has arithmetic, variables, functions + +**Minimal Changes Needed:** +- New parser for calculator syntax +- Assignment operator handling +- Print statement + +## 3. Configuration Language +**Feasibility: High** + +A simple config language for key-value pairs and nested structures: +- **Syntax**: `server { port = 8080; host = "localhost"; }` +- **Features**: Nested objects, arrays, basic expressions +- **VM Fit**: Good - can use cons cells for nested structures + +**Required Extensions:** +- String support +- Object/struct creation +- Field access operations + +## 4. Simple Scripting Language +**Feasibility: Medium** + +A basic scripting language with loops and conditionals: +- **Syntax**: `if x > 0 { y = x * 2; } else { y = 0; }` +- **Features**: Variables, conditionals, loops, functions +- **VM Fit**: Good but needs control flow + +**Required Extensions:** +- Conditional jumps +- Loop constructs +- Boolean logic operations + +## 5. Data Processing Language +**Feasibility: Medium** + +A language for simple data transformations: +- **Syntax**: `filter(x > 0) | map(x * 2) | sum()` +- **Features**: Pipeline operations, list processing +- **VM Fit**: Good - can use cons cells for lists + +**Required Extensions:** +- List operations (map, filter, reduce) +- Pipeline operator +- More built-in functions + +## 6. Simple Logic/Constraint Language +**Feasibility: High** + +A language for expressing simple constraints and rules: +- **Syntax**: `rule: if age > 18 then can_vote = true` +- **Features**: Rules, facts, simple inference +- **VM Fit**: Good - boolean operations and variables + +**Required Extensions:** +- Boolean logic (AND, OR, NOT) +- Rule evaluation +- Fact storage + +## VM Strengths for Alternative Languages: +1. **Stack-based**: Natural for many languages +2. **Typed values**: Good foundation for type safety +3. **Environment model**: Supports variables and scoping +4. **Function system**: Reusable for different syntaxes +5. **Extensible**: Easy to add new instructions + +## VM Limitations for Alternative Languages: +1. **Limited data types**: Only numbers, booleans, pairs, symbols +2. **No strings**: Would need to add string support +3. **Limited control flow**: Only function calls, no loops/conditionals +4. **No I/O**: Would need file/console operations +5. **Memory constraints**: Simple heap model + +## Most Realistic Next Steps: +1. **Forth**: Natural progression, leverages existing stack operations +2. **Calculator**: Minimal changes, good learning exercise +3. **Config Language**: Practical use case, moderate complexity + +## Implementation Strategy: +1. Keep the VM unchanged +2. Write new compilers for different syntaxes +3. Add minimal VM extensions as needed +4. Reuse existing VM infrastructure (stack, environment, functions) + +The VM's simplicity is actually a strength - it's easy to understand and extend, making it a good foundation for experimenting with different language designs. \ No newline at end of file |