diff options
91 files changed, 6401 insertions, 1280 deletions
diff --git a/awk/scheme/scheme/Makefile b/awk/scheme/scheme/Makefile new file mode 100644 index 0000000..52eaed5 --- /dev/null +++ b/awk/scheme/scheme/Makefile @@ -0,0 +1,52 @@ +# Makefile for Awk-Scheme + +.PHONY: test test-debug clean + +# Default target +all: test + +# Run all tests +test: + @echo "Running Awk-Scheme test suite..." + @./test/run_tests.sh + +# Run tests with debug output +test-debug: + @echo "Running Awk-Scheme test suite with debug output..." + @DEBUG=1 ./test/run_tests.sh + +# Run specific test categories +test-unit: + @echo "Running unit tests..." + @find test/unit -name "*.scm" -exec ./scheme {} \; + +test-integration: + @echo "Running integration tests..." + @find test/integration -name "*.scm" -exec ./scheme {} \; + +test-examples: + @echo "Running example programs..." + @find test/examples -name "*.scm" -exec ./scheme {} \; + +test-regression: + @echo "Running regression tests..." + @find test/regression -name "*.scm" -exec ./scheme {} \; + +# Clean up temporary files +clean: + @echo "Cleaning up temporary files..." + @rm -f /tmp/scheme_vm.state /tmp/scheme_vm.env + @find . -name "*.tmp" -delete + @find . -name "*~" -delete + +# Show help +help: + @echo "Available targets:" + @echo " test - Run all tests" + @echo " test-debug - Run all tests with debug output" + @echo " test-unit - Run only unit tests" + @echo " test-integration - Run only integration tests" + @echo " test-examples - Run only example programs" + @echo " test-regression - Run only regression tests" + @echo " clean - Clean up temporary files" + @echo " help - Show this help message" \ No newline at end of file diff --git a/awk/scheme/scheme/README.md b/awk/scheme/scheme/README.md index 5dae265..cf346c7 100644 --- a/awk/scheme/scheme/README.md +++ b/awk/scheme/scheme/README.md @@ -1,220 +1,221 @@ # Awk-Scheme -## Overview -A scheme interpreter implemented in AWK: +A Scheme interpreter implemented in AWK with a stack-based virtual machine. -- A compiler that translates scheme expressions to stack-based VM instructions -- A virtual machine that executes the compiled code -- Support for basic arithmetic, list operations, functions, and variable bindings -- **Closure support** for nested function definitions and lexical scoping -- Persistent global state between REPL sessions +## Overview -## Architecture +- **Compiler** (`bin/compiler.awk`): Translates Scheme to VM bytecode +- **Virtual Machine** (`bin/vm.awk`): Executes bytecode with stack-based evaluation +- **REPL** (`bin/repl`): Interactive interface with debug support +- **Standard Library**: Essential functions for lists, strings, math, and control flow +- **Higher-Order Functions**: `map` and `filter` with support for all function types -### Components -1. **Compiler** (`bin/compiler.awk`): - - Lexical analyzer for tokenizing input - - Recursive descent parser for scheme expressions - - Code generator that produces stack-based VM instructions - - Handles nested expressions and proper scoping - - **Generates closure creation code for lambda expressions** - -2. **Virtual Machine** (`bin/vm.awk`): - - Stack-based execution model - - Typed value system with runtime type checking - - Environment-based variable binding - - **Direct function execution** (no program array modification) - - **Closure support** with environment capture and restoration - - Heap memory for cons cells - - State persistence for globals and functions - -3. **REPL** (`bin/repl`): - - Interactive command line interface - - Multi-line input support - - State persistence between sessions - - Debug mode support +## Quick Start -### Data Types -All values are tagged with their type: -- `N:value` - Numbers (integers) -- `B:value` - Booleans (0/1) -- `P:index` - Pairs (cons cells) -- `F:name` - Functions -- `NIL:` - Empty list -- `S:name` - Symbols -- **`CLOSURE:func_name:env_id` - Closure objects (function + captured environment)** - -## Usage - -### Running the REPL ```bash -# Start interactive REPL +# Interactive REPL ./scheme -# Run a Scheme file -./scheme myprogram.scm +# Run a file +./scheme program.scm -# Enable debug output +# Debug mode DEBUG=1 ./scheme ``` -### Basic Operations +## Core Features -1. **Arithmetic**: +### Basic Operations ```scheme -scheme> (+ 1 2 3) -N:6 -scheme> (* 4 (- 10 5)) -N:20 +(+ 1 2 3) ; => N:6 +(define x 42) ; => N:42 +(cons 1 (cons 2 nil)) ; => P:1 +((lambda (x) (+ x 1)) 41) ; => N:42 ``` -2. **Comparisons**: +### Higher-Order Functions ```scheme -scheme> (< 1 2) -B:1 -scheme> (= 42 42) -B:1 +(map abs (list -1 -2 -3)) ; => (1 2 3) +(filter positive? (list -1 2 -3 4)) ; => (2 4) +(define double (lambda (x) (* x 2))) +(map double (list 1 2 3)) ; => (2 4 6) ``` -3. **Lists**: +### Control Flow ```scheme -scheme> (cons 1 (cons 2 nil)) -P:1 -scheme> (car (cons 1 2)) -N:1 +(if (> 5 3) "yes" "no") ; => STR:yes +(cond ((> 5 10) "big") + ((> 5 3) "medium") + (else "small")) ; => STR:medium ``` -4. **Variables**: -```scheme -scheme> (define x 42) -N:42 -scheme> x -N:42 +## Standard Library + +### Numeric Operations +- `(+ - * /)` - Basic arithmetic (supports multiple arguments) +- `(modulo expt abs min max)` - Advanced math +- `(zero? positive? negative?)` - Number predicates + +### List Operations +- `(cons car cdr length append reverse member)` - Core list functions +- `(cadr caddr list-ref list-tail)` - List access utilities +- `(null? pair?)` - List type predicates + +### String Operations +- `(string-length string-append string-ref substring)` - String manipulation +- `(string=? string<? string>?)` - String comparison + +### Control Flow +- `(if cond and or not)` - Logic and conditionals + +## Architecture + +### Data Types +All values are tagged: `N:42` (number), `B:1` (boolean), `P:1` (pair), `S:name` (symbol), `CLOSURE:fn:env` (closure), `STR:content` (string), `NIL:` (empty list). + +### VM Instructions +- **Stack**: `PUSH_CONST`, `POP`, `ADD`, `SUB`, `MUL`, `DIV` +- **Environment**: `STORE`, `LOOKUP`, `POP_ENV` +- **Functions**: `LABEL`, `CALL`, `RETURN` +- **Closures**: `CAPTURE_ENV`, `PUSH_CONST CLOSURE:fn:env` + +### Execution Flow +``` +Scheme Code → Compiler → VM Bytecode → Virtual Machine → Result ``` -5. **Functions**: -```scheme -scheme> (define add (x y) (+ x y)) -scheme> (add 2 3) -N:5 +## Testing + +```bash +# Run all tests +./test/run_tests.sh + +# Run with debug +DEBUG=1 ./test/run_tests.sh ``` -6. **Let Expressions**: -```scheme -scheme> (let ((x 10) (y 20)) (+ x y)) -N:30 +Test categories: +- **Unit tests**: Individual features +- **Integration tests**: Feature combinations +- **Examples**: Demonstration programs +- **Regression tests**: Edge cases + +## Debugging + +### Enable Debug Mode +```bash +DEBUG=1 ./scheme # REPL with debug +DEBUG=1 ./scheme program.scm # File with debug +echo "(+ 1 2)" | DEBUG=1 ./scheme # Single expression ``` -7. **Lambda Functions**: -```scheme -scheme> ((lambda (x) (+ x 1)) 41) -N:42 -scheme> ((lambda (x y) (+ x y)) 20 22) -N:42 +### Debug Output Shows +- Token parsing and expression tree construction +- VM instruction generation +- Step-by-step execution with stack state +- Environment changes and variable lookups +- Function calls and closure operations + +### Command-Line Debugging Hints + +**View bytecode generation**: +```bash +echo "(+ 1 2)" | ./bin/compiler.awk ``` -8. **Closures**: -```scheme -scheme> (let ((x 10)) ((lambda (y) (+ x y)) 32)) -N:42 -scheme> (let ((add1 (lambda (x) (+ x 1)))) (add1 41)) -N:42 -``` - -**Note**: Nested lambda definitions (lambdas defined inside other lambdas) are not yet supported. - -## Implementation Details - -### Compiler Pipeline -1. **Lexical Analysis**: - - Tokenizes input into numbers, symbols, and parentheses - - Handles whitespace and basic comments - -2. **Parsing**: - - Builds expression tree from tokens - - Validates syntax and expression structure - -3. **Code Generation**: - - Translates expressions to VM instructions - - Manages scope and variable bindings - - Handles function definitions - - **Generates closure creation code for lambda expressions** - -### Virtual Machine -1. **Stack Operations**: - - PUSH_CONST: Push constant value - - POP: Remove top value - - STORE: Store variable binding - - LOOKUP: Retrieve variable value - -2. **Function Calls**: - - **Direct execution** of function code with proper program restoration - - Environment frame management - - **Closure support** with environment capture and restoration - -3. **Closure System**: - - **CAPTURE_ENV**: Captures current environment for closure creation - - **Environment restoration**: Restores captured environments on closure calls - - **Variable lookup enhancement**: Checks closure environments for variable resolution - -4. **Memory Management**: - - Stack-based evaluation - - Simple heap for cons cells - - Basic reference counting (not fully implemented) - -### State Persistence -- Global variables and functions persist between sessions -- State stored in `/tmp/scheme_vm.state` and `/tmp/scheme_vm.env` -- Automatic state loading/saving -- **Note**: State persistence is enabled by default in the REPL - -## Extending the System - -### Adding New Special Forms -1. Add parsing in `compile_expr()` -2. Implement code generation function -3. Add corresponding VM instructions if needed - -### Adding New Data Types -1. Define new type tag in VM -2. Add type checking predicates -3. Implement value construction/access -4. Update relevant operations - -### Adding VM Instructions -1. Add instruction handling in `execute()` -2. Implement instruction logic -3. Update compiler to generate new instruction - -### Debugging -- Enable debug output: `DEBUG=1 ./scheme` -- Debug messages show: - - Lexical analysis - - Parsing steps - - Code generation - - VM execution - - Stack operations - - Environment changes - - **Closure creation and restoration** -- **Note**: Some debug output may be suppressed in the current implementation +**Step-by-step VM execution**: +```bash +echo "(+ 1 2)" | ./bin/compiler.awk | DEBUG=1 ./bin/vm.awk +``` + +**Trace specific functions**: +```bash +grep -n "function_name" bin/vm.awk +``` + +**Check syntax errors**: +```bash +echo "(+ 1" | ./bin/compiler.awk # Should show error +``` + +**Compare outputs**: +```bash +diff <(echo "(+ 1 2)" | ./bin/compiler.awk) <(echo "(+ 2 1)" | ./bin/compiler.awk) +``` + +### Common Issues + +**"Unknown expression type"**: Usually a parsing issue. Check for unsupported syntax. + +**"Stack underflow"**: Function arguments not consumed correctly. Check function definitions. + +**"Undefined variable"**: Variable not bound in current environment. Check `define` statements. + +**"Undefined function"**: Function not registered in VM. Check standard library implementation. ## Limitations -Current implementation does not support: -- **Nested lambda definitions** (lambdas defined inside other lambdas) - this causes "Undefined closure function" errors -- Floating point numbers -- Strings + +**Not Supported**: +- Floating point numbers (integers only) - Proper tail recursion - Garbage collection -- Error recovery in REPL -- Full numeric tower - Macros -- Standard library - -## Future Enhancements -1. **Nested lambda support** (complex but achievable) -2. Proper tail call optimization -3. Garbage collection -4. Error recovery -5. More data types -6. Standard library -7. Better debugging tools \ No newline at end of file +- Full numeric tower + +**Partially Supported**: +- Comments (basic handling, may be unreliable) +- State persistence (experimental, not fully reliable) +- Error recovery in REPL + +**Performance**: Educational implementation, not optimized for speed. + +## Extending + +### Adding Standard Library Functions + +1. **Implement in VM** (`bin/vm.awk`): +```awk +function stdlib_new_function() { + if (stack_ptr < 1) error("new-function requires one argument") + val = pop() + # ... implementation ... + push(result) +} +``` + +2. **Register function**: +```awk +FUNCTIONS["new-function"] = "stdlib_new_function" +``` + +3. **Add dispatch** in `vm_call_function()`: +```awk +} else if (built_in_name == "stdlib_new_function") { + stdlib_new_function() + return +``` + +4. **Test**: Create test file and verify functionality. + +### Adding Special Forms + +1. Add parsing in `compile_expr()` +2. Implement code generation function +3. Add VM instructions if needed + +## Forth Implementation + +The VM is language-agnostic. A Forth compiler (`scratch/forth/forth.awk`) demonstrates this by generating the same bytecode format. + +```bash +cd scratch/forth +echo "5 3 + ." | awk -f forth.awk # => Result: N:8 +``` + +## Files + +- `bin/compiler.awk` - Scheme to bytecode compiler +- `bin/vm.awk` - Virtual machine implementation +- `bin/repl` - Interactive REPL +- `test/` - Comprehensive test suite +- `scratch/forth/` - Forth implementation for VM validation \ No newline at end of file diff --git a/awk/scheme/scheme/TODO.txt b/awk/scheme/scheme/TODO.txt deleted file mode 100644 index f576fba..0000000 --- a/awk/scheme/scheme/TODO.txt +++ /dev/null @@ -1,70 +0,0 @@ -Scheme Interpreter TODO -===================== - -Current State: -------------- -- Basic arithmetic operations (+,-,*,/) are working -- Let expressions with simple bindings are working -- Function definitions (define) and lambda expressions are implemented (no nested lambdas) -- Stack-based virtual machine with environment for variable bindings -- Closures supported for top-level lambdas (not nested) -- State persistence for globals and functions between REPL sessions - -Recent Changes: --------------- -1. Improved function table and environment handling -2. Enhanced debug logging for function calls and environment -3. Refined lambda compilation and closure creation -4. Updated documentation and architectural notes - -Current Issues / Limitations: ----------------------------- -1. **Nested Lambda Support**: - - Lambdas defined inside other lambdas are not supported ("Undefined closure function" errors) -2. **Reference Counting**: - - Reference counting for heap objects (cons cells) is stubbed but not enforced -3. **Error Handling**: - - Minimal error recovery, especially in the REPL -4. **Tail Recursion**: - - No proper tail call optimization -5. **Data Types**: - - No floating point, string, or full numeric tower support -6. **Garbage Collection**: - - No GC; memory for cons cells is not reclaimed -7. **Standard Library**: - - No standard library or macros - -Next Steps: ------------ -1. **Implement Nested Lambda Support**: - - Refactor closure environment capture and restoration - - Update compiler and VM to support nested closures -2. **Improve Memory Management**: - - Implement or enforce reference counting - - Plan for future garbage collection -3. **Enhance Error Handling**: - - Add better error messages and recovery in REPL and VM -4. **Tail Call Optimization**: - - Investigate and implement proper tail recursion -5. **Testing and Debugging**: - - Expand test suite for edge cases (especially closures and let/lambda) - - Continue using DEBUG=1 for tracing execution -6. **Documentation and Code Quality**: - - Keep architectural notes and code comments up to date - - Document any new patterns or changes - -Technical Notes: ---------------- -- Function calls use LOOKUP, GET_VALUE, and CALL instructions -- Environment stack manages variable bindings and function parameters -- Function code is stored in the FUNCTIONS table with unique names -- Lambda functions use __lambda_N naming scheme; closures capture environment at creation -- State is persisted to /tmp/scheme_vm.state and /tmp/scheme_vm.env - -Debugging Tips: --------------- -1. Enable DEBUG=1 to see detailed execution logs -2. Check stack and environment contents before and after each operation -3. Monitor closure creation and environment capture/restoration -4. Verify function code storage and retrieval in the FUNCTIONS table -5. Use and expand test cases for all supported features \ No newline at end of file diff --git a/awk/scheme/scheme/WHAT-NEEDS-FIXING.md b/awk/scheme/scheme/WHAT-NEEDS-FIXING.md new file mode 100644 index 0000000..4c07b02 --- /dev/null +++ b/awk/scheme/scheme/WHAT-NEEDS-FIXING.md @@ -0,0 +1,39 @@ +# What Needs Fixing + +## Known Bugs + +### 1. Nested Defines and Multi-Expression Function Bodies +- **Symptom:** Tests with nested `define` or multiple expressions in a function body (e.g., `advanced_functions.scm`, `environment_scoping.scm`, `complex_programs.scm`) fail with errors like: + - `Error: Unknown expression type: y))(inner1` + - `Error: Unknown expression type: global) globa` + - `Error: Unknown expression type: 1)))))(reverse-helperstr(-(string-leng` +- **Suspected Cause:** + - The S-expression splitting or parsing logic is leaving behind fragments after processing nested defines or multi-expression bodies. + - There may be a mismatch between how the splitter and the parser advance through the input, especially for function bodies. + +### 2. Debug Output Not Triggered in Key Functions +- **Symptom:** Even with `DEBUG_SEXPR=1`, debug output from `split_sexpressions` and `compile_lambda` does not appear, suggesting these functions are not being called for the failing cases. +- **Suspected Cause:** + - The error may occur before any lambda body is processed, or in a code path that does not use these functions (e.g., top-level parsing or define handling). + +## Next Debugging Steps + +1. **Add Debug Prints to Key Functions** + - Add debug prints to `compile_define` and `compile_expr` to trace which expressions are being processed and when errors occur. + - This will help determine if the error is in top-level parsing, define handling, or function body processing. + +2. **Test the S-expression Splitter in Isolation** + - Take a failing function body and run it through `split_sexpressions` directly, printing the results. + - Confirm if the splitter is producing complete S-expressions or fragments. + +3. **Trace the Error Path** + - Use debug output to follow the flow from top-level parsing, through defines, into lambdas, and into function bodies. + - Identify exactly where the parser or code generator leaves behind fragments. + +4. **Compare with Reference Implementations** + - Review the approach in [littlelisp](https://github.com/maryrosecook/littlelisp) for robust S-expression parsing and evaluation. + - Consider adopting a similar recursive descent parsing strategy if needed. + +## Goal +- Ensure that all S-expressions (top-level and in function bodies) are parsed and compiled correctly, with no fragments left behind. +- Make the debug infrastructure robust and easy to enable/disable for future debugging. \ No newline at end of file diff --git a/awk/scheme/scheme/bin/compiler.awk b/awk/scheme/scheme/bin/compiler.awk index 94d6e99..7e43631 100755 --- a/awk/scheme/scheme/bin/compiler.awk +++ b/awk/scheme/scheme/bin/compiler.awk @@ -1,19 +1,47 @@ #!/usr/bin/awk -f -# This is a Scheme-to-Assembly compiler implemented in AWK -# It takes Scheme expressions as input and outputs assembly instructions -# for a custom stack-based virtual machine +# Scheme-to-VM Compiler +# +# This compiler translates Scheme expressions into stack-based VM instructions. +# The design prioritizes simplicity and correctness, making it suitable for +# educational purposes and small-scale applications. +# +# Architecture Overview: +# - Lexical analysis tokenizes input into meaningful units +# - Recursive descent parsing builds expression trees +# - Code generation produces VM instructions for execution +# - Special form handling for control flow and function definitions +# - Standard library integration for extended functionality +# +# Key Design Decisions: +# - Recursive descent parsing for simplicity and predictable behavior +# - Stack-based instruction generation for efficient VM execution +# - Environment-based variable binding for lexical scoping +# - Special form recognition for control flow constructs +# - Standard library function integration for extended functionality +# - Stack clearing between expressions to prevent argument pollution BEGIN { + + # Debug mode configuration + # FIXME: Don't leave in, trigger on flag + print "[DEBUG_SEXPR] BEGIN block reached" > "/dev/stderr" + print "[DEBUG_SEXPR] ENVIRON[\"DEBUG_SEXPR\"] = " ENVIRON["DEBUG_SEXPR"] > "/dev/stderr" + DEBUG_SEXPR = (ENVIRON["DEBUG_SEXPR"] == "1") ? 1 : 0 + if (DEBUG_SEXPR) print "[DEBUG_SEXPR] DEBUG_SEXPR is enabled" > "/dev/stderr" + # Compiler maintains internal state for code generation curr_token = "" # Current token being processed by lexer input_buffer = "" # Buffer for input text being tokenized next_label = 0 # Counter for generating unique labels program = "" # Accumulates the full program text + # Debug mode configuration # AWK FEATURE: ENVIRON is a built-in array containing environment variables # Unlike JS process.env, this is automatically available in awk DEBUG = (ENVIRON["DEBUG"] == "1") ? 1 : 0 + error_flag = 0 # Set to 1 if any error occurs + DEBUG_SEXPR = (ENVIRON["DEBUG_SEXPR"] == "1") ? 1 : 0 } # Debug logging helper function @@ -39,46 +67,74 @@ END { # Parse and compile each expression in the program split_expressions(program) + debug("END block: error_flag=" error_flag) + # If error_flag was set, exit 1 + if (error_flag) exit 1 } # Splits input into individual Scheme expressions -# Handles nested parentheses and comments -function split_expressions(prog, current, paren_count, i, c, expr, cleaned) { +# This function handles the complexity of Scheme syntax including: +# - Nested parentheses and proper expression boundaries +# - Comments that can span multiple lines +# - String literals that may contain parentheses +# - Whitespace normalization for consistent parsing +# +# The function processes the entire program text and identifies complete +# expressions that can be compiled independently +function split_expressions(prog, current, paren_count, i, c, expr, cleaned, lines, n, line, in_string, out, j) { current = "" paren_count = 0 - - # Clean up the input: - # 1. Remove comments (text between ; and next opening paren) - # 2. Normalize whitespace - # 3. Trim leading/trailing whitespace - cleaned = prog - # AWK FEATURE: gsub() is a built-in function for global substitution (like replaceAll in JS) - # gsub(pattern, replacement, target) - modifies target in place and returns count - gsub(/;[^(]*\(/, "(", cleaned) # Remove comments before expressions - gsub(/\)[^)]*;/, ")", cleaned) # Remove comments after expressions - gsub(/[ \t\n]+/, " ", cleaned) # Normalize whitespace to single spaces - # AWK FEATURE: sub() is like gsub() but only replaces the first occurrence - sub(/^[ \t\n]+/, "", cleaned) # Trim leading whitespace - sub(/[ \t\n]+$/, "", cleaned) # Trim trailing whitespace - + # Improved comment removal: process line by line + n = split(prog, lines, "\n") + out = "" + for (j = 1; j <= n; j++) { + line = lines[j] + # Skip lines that start with ';' (comments) + if (line ~ /^[ \t]*;/) continue + # Remove inline comments, but not inside strings + in_string = 0 + cleaned_line = "" + for (i = 1; i <= length(line); i++) { + c = substr(line, i, 1) + if (c == "\"") in_string = !in_string + if (!in_string && c == ";") break + cleaned_line = cleaned_line c + } + # Append cleaned line + out = out cleaned_line "\n" + } + cleaned = out debug("Cleaned program: [" cleaned "]") + if (cleaned == "") return if (cleaned == "") return - # Parse expressions by tracking parenthesis nesting + # Parse expressions by tracking parenthesis nesting and string literals + # This approach ensures that parentheses inside strings don't affect + # expression boundaries, and that comments are properly handled # AWK FEATURE: length(string) returns the length of a string # Unlike JS string.length, this is a function call, not a property + in_string = 0 # Track if we're inside a string literal + for (i = 1; i <= length(cleaned); i++) { c = substr(cleaned, i, 1) - if (c == "(") { + # Handle string literals + if (c == "\"" && !in_string) { + in_string = 1 + if (paren_count == 0) current = "" + } else if (c == "\"" && in_string) { + in_string = 0 + } + + if (c == "(" && !in_string) { if (paren_count == 0) current = "" paren_count++ } current = current c - if (c == ")") { + if (c == ")" && !in_string) { paren_count-- if (paren_count == 0) { # Complete expression found - compile it @@ -90,9 +146,52 @@ function split_expressions(prog, current, paren_count, i, c, expr, cleaned) { program = expr # Set for parser expr = parse_expr() compile_expr(expr) + # Clear stack between expressions to prevent pollution + print "CLEAR_STACK" # Clear stack between expressions current = "" } } + + # Handle atomic expressions (not in parentheses or strings) + if (paren_count == 0 && !in_string && c == " " && current != "") { + # We've reached a space after an atomic expression + expr = current + sub(/^\s+/, "", expr) + sub(/\s+$/, "", expr) + + if (expr != "") { + debug("Processing atomic expression: [" expr "]") + program = expr # Set for parser + expr = parse_expr() + compile_expr(expr) + # Clear stack between expressions to prevent pollution + print "CLEAR_STACK" # Clear stack between expressions + } + current = "" + } + } + + # Handle the last expression if it's atomic + if (paren_count == 0 && !in_string && current != "") { + expr = current + sub(/^\s+/, "", expr) + sub(/\s+$/, "", expr) + + if (expr != "") { + debug("Processing final atomic expression: [" expr "]") + program = expr # Set for parser + expr = parse_expr() + compile_expr(expr) + # Clear stack after the final expression + print "CLEAR_STACK" + } + } + + # Check for incomplete expressions + if (paren_count > 0) { + debug("paren_count at end of split_expressions: " paren_count) + error("Unmatched opening parentheses - incomplete expression") + exit 1 } # Add final HALT instruction @@ -126,6 +225,37 @@ function next_token() { return c } + # Handle string literals (double quotes) + if (c == "\"") { + str = "" + input_buffer = substr(input_buffer, 2) # Skip opening quote + + while (length(input_buffer) > 0) { + c = substr(input_buffer, 1, 1) + if (c == "\"") { + input_buffer = substr(input_buffer, 2) # Skip closing quote + break + } + if (c == "\\") { + # Handle escape sequences + input_buffer = substr(input_buffer, 2) + if (length(input_buffer) > 0) { + c = substr(input_buffer, 1, 1) + if (c == "n") str = str "\n" + else if (c == "t") str = str "\t" + else if (c == "\\") str = str "\\" + else if (c == "\"") str = str "\"" + else error("Invalid escape sequence: \\" c) + input_buffer = substr(input_buffer, 2) + } + } else { + str = str c + input_buffer = substr(input_buffer, 2) + } + } + return "\"" str "\"" # Return with quotes for identification + } + # Handle numbers (including negative numbers) # AWK FEATURE: substr(string, start, length) extracts substring # Unlike JS string.slice(), this is 1-indexed and requires explicit length @@ -158,7 +288,13 @@ function next_token() { } # Recursive descent parser for Scheme expressions -# Returns parsed expression as a string +# This parser implements a simple but complete parsing strategy that: +# - Handles nested expressions through recursion +# - Distinguishes between atoms and list expressions +# - Provides clear error messages for malformed input +# - Supports string literals and various atom types +# +# Returns parsed expression as a string for further processing function parse_expr(token, result) { token = next_token() if (token == "EOF") return "" @@ -169,11 +305,22 @@ function parse_expr(token, result) { return result } + # Handle string literals + if (substr(token, 1, 1) == "\"") { + debug("Parsed string: " token) + return token + } + debug("Parsed token: " token) return token } # Parses a list expression (anything in parentheses) +# This function handles the complexity of nested list structures by: +# - Recursively parsing each element in the list +# - Maintaining proper nesting levels +# - Providing clear error messages for unmatched parentheses +# - Supporting empty lists and nested expressions function parse_list(result, expr) { result = "" @@ -190,7 +337,13 @@ function parse_list(result, expr) { } # Splits an expression into operator and arguments -# Handles nested expressions correctly +# This function handles the complexity of Scheme function calls by: +# - Correctly identifying the operator (first element) +# - Preserving nested expressions as single arguments +# - Handling whitespace and parentheses properly +# - Supporting both simple calls and complex nested expressions +# +# Handles nested expressions correctly by tracking parenthesis nesting function split_expr(expr, i, len, c, op, args, paren_count) { # AWK FEATURE: length(string) returns the length of a string # Unlike JS string.length, this is a function call, not a property @@ -220,25 +373,33 @@ function split_expr(expr, i, len, c, op, args, paren_count) { return op SUBSEP args } -# Splits argument list handling nested parentheses -function split_args(args, arg_array, len, i, c, current, paren_count, arg_count) { +# Splits argument list handling nested parentheses and string literals +function split_args(args, arg_array, len, i, c, current, paren_count, arg_count, in_string) { # AWK FEATURE: length(string) returns the length of a string # Unlike JS string.length, this is a function call, not a property len = length(args) current = "" paren_count = 0 arg_count = 0 + in_string = 0 for (i = 1; i <= len; i++) { c = substr(args, i, 1) - if (c == "(") paren_count++ - if (c == ")") paren_count-- + # Handle string literals + if (c == "\"" && !in_string) { + in_string = 1 + } else if (c == "\"" && in_string) { + in_string = 0 + } + + if (c == "(" && !in_string) paren_count++ + if (c == ")" && !in_string) paren_count-- - if (c == " " && paren_count == 0 && current != "") { + if (c == " " && paren_count == 0 && !in_string && current != "") { arg_array[++arg_count] = current current = "" - } else if (c != " " || paren_count > 0) { + } else if (c != " " || paren_count > 0 || in_string) { current = current c } } @@ -256,6 +417,18 @@ function compile_number(num) { print "PUSH_CONST N:" num } +# Code generation for string literals +function compile_string(str) { + debug("Compiling string: " str) + # Remove outer quotes and emit string constant + content = substr(str, 2, length(str) - 2) + # Escape newlines and other special characters for output + gsub(/\n/, "\\n", content) + gsub(/\t/, "\\t", content) + gsub(/\\/, "\\\\", content) + print "PUSH_CONST STR:" content +} + # Code generation for primitive operations (+, -, *, cons, etc) function compile_primitive_call(op, args, arg_array, nargs, i) { debug("Primitive call: op=" op " args=" args) @@ -308,6 +481,57 @@ function compile_primitive_call(op, args, arg_array, nargs, i) { for (i = 1; i < nargs; i++) print "MUL" } + else if (op == "/") { + if (nargs < 2) error("/ requires at least 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + for (i = 1; i < nargs; i++) + print "DIV" + } + else if (op == "modulo" || op == "%") { + if (nargs != 2) error("modulo requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP modulo" + print "GET_VALUE" + print "CALL" + } + else if (op == "expt") { + if (nargs != 2) error("expt requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP expt" + print "GET_VALUE" + print "CALL" + } + else if (op == "abs") { + if (nargs != 1) error("abs requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP abs" + print "GET_VALUE" + print "CALL" + } + else if (op == "min") { + if (nargs != 2) error("min requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP min" + print "GET_VALUE" + print "CALL" + } + else if (op == "max") { + if (nargs != 2) error("max requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP max" + print "GET_VALUE" + print "CALL" + } else if (op == "cons") { if (nargs != 2) error("cons requires 2 arguments") # Compile arguments @@ -344,6 +568,224 @@ function compile_primitive_call(op, args, arg_array, nargs, i) { } print "EQ" } + # Standard library functions + else if (op == "null?") { + if (nargs != 1) error("null? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP null?" + print "GET_VALUE" + print "CALL" + } + else if (op == "pair?") { + if (nargs != 1) error("pair? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP pair?" + print "GET_VALUE" + print "CALL" + } + else if (op == "number?") { + if (nargs != 1) error("number? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP number?" + print "GET_VALUE" + print "CALL" + } + else if (op == "string?") { + if (nargs != 1) error("string? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP string?" + print "GET_VALUE" + print "CALL" + } + else if (op == "boolean?") { + if (nargs != 1) error("boolean? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP boolean?" + print "GET_VALUE" + print "CALL" + } + else if (op == "symbol?") { + if (nargs != 1) error("symbol? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP symbol?" + print "GET_VALUE" + print "CALL" + } + else if (op == "zero?") { + if (nargs != 1) error("zero? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP zero?" + print "GET_VALUE" + print "CALL" + } + else if (op == "positive?") { + if (nargs != 1) error("positive? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP positive?" + print "GET_VALUE" + print "CALL" + } + else if (op == "negative?") { + if (nargs != 1) error("negative? requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP negative?" + print "GET_VALUE" + print "CALL" + } + else if (op == "length") { + if (nargs != 1) error("length requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP length" + print "GET_VALUE" + print "CALL" + } + else if (op == "cadr") { + if (nargs != 1) error("cadr requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP cadr" + print "GET_VALUE" + print "CALL" + } + else if (op == "caddr") { + if (nargs != 1) error("caddr requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP caddr" + print "GET_VALUE" + print "CALL" + } + else if (op == "list-ref") { + if (nargs != 2) error("list-ref requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP list-ref" + print "GET_VALUE" + print "CALL" + } + else if (op == "list-tail") { + if (nargs != 2) error("list-tail requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP list-tail" + print "GET_VALUE" + print "CALL" + } + else if (op == "append") { + if (nargs != 2) error("append requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP append" + print "GET_VALUE" + print "CALL" + } + else if (op == "list") { + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP list" + print "GET_VALUE" + print "CALL_WITH_ARGS " nargs + } + else if (op == "reverse") { + if (nargs != 1) error("reverse requires 1 argument") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP reverse" + print "GET_VALUE" + print "CALL" + } + else if (op == "member") { + if (nargs != 2) error("member requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP member" + print "GET_VALUE" + print "CALL" + } + else if (op == "map") { + if (nargs != 2) error("map requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP map" + print "GET_VALUE" + print "CALL" + } + else if (op == "filter") { + if (nargs != 2) error("filter requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP filter" + print "GET_VALUE" + print "CALL" + } + # String operations + else if (op == "string-length") { + if (nargs != 1) error("string-length requires 1 argument") + compile_expr(arg_array[1]) + print "LOOKUP string-length" + print "GET_VALUE" + print "CALL" + } + else if (op == "string-append") { + if (nargs < 2) error("string-append requires at least 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP string-append" + print "GET_VALUE" + print "CALL_WITH_ARGS " nargs + } + else if (op == "string-ref") { + if (nargs != 2) error("string-ref requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP string-ref" + print "GET_VALUE" + print "CALL" + } + else if (op == "substring") { + if (nargs != 3) error("substring requires 3 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP substring" + print "GET_VALUE" + print "CALL" + } + else if (op == "string=?") { + if (nargs != 2) error("string=? requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP string=?" + print "GET_VALUE" + print "CALL" + } + else if (op == "string<?") { + if (nargs != 2) error("string<? requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP string<?" + print "GET_VALUE" + print "CALL" + } + else if (op == "string>?") { + if (nargs != 2) error("string>? requires 2 arguments") + for (i = 1; i <= nargs; i++) { + compile_expr(arg_array[i]) + } + print "LOOKUP string>?" + print "GET_VALUE" + print "CALL" + } else { # Function call for user-defined functions debug("Function call: " op) @@ -485,7 +927,45 @@ function compile_define(args, name, params, body, param_array, nparams, i, paren args = substr(args, i + 1) # Check if it's a function or variable definition + # A function definition has the form: (define name (param1 param2) body) + # A variable definition has the form: (define name value) + debug("compile_define: args = [" args "]") if (substr(args, 1, 1) == "(") { + # Check if this is a function definition or a variable definition with function call + # Look for the pattern: (name ...) where name is not a function call + # For (list 1), we need to check if the first word after ( is a function name + paren_count = 0 + i = 1 + first_word = "" + while (i <= length(args)) { + c = substr(args, i, 1) + if (c == "(") paren_count++ + if (c == ")") paren_count-- + if (c == " " && paren_count == 1) { + # Found a space inside the first parentheses + # Check if the first word is a function name (like list, cons, etc.) + if (first_word == "list" || first_word == "cons" || first_word == "append" || + first_word == "reverse" || first_word == "member" || first_word == "length" || + first_word == "car" || first_word == "cdr" || first_word == "cadr" || + first_word == "caddr" || first_word == "list-ref" || first_word == "list-tail" || + first_word == "null?" || first_word == "pair?" || first_word == "lambda" || + first_word == "map" || first_word == "filter") { + # This is a variable definition with function call value + debug("Defining variable with function call value: " name " with value: " args) + compile_expr(args) # Compile the value + print "STORE " name # Store the variable + return + } else { + # This is a function definition with parameters + break + } + } + if (paren_count == 1 && c != "(") { + first_word = first_word c + } + i++ + } + # It's a function definition paren_count = 1 i = 2 @@ -527,19 +1007,16 @@ function compile_define(args, name, params, body, param_array, nparams, i, paren } # Compiles lambda expressions (anonymous functions) -function compile_lambda(args, params, body, param_array, nparams, i, lambda_name) { +function compile_lambda(args, params, body, param_array, nparams, i, lambda_name, expr, op, rest, sexprs, nsexprs, j, first_body, last_body) { + if (DEBUG_SEXPR) print "[DEBUG_SEXPR] compile_lambda called" > "/dev/stderr" # Generate a unique name for the lambda function lambda_name = "__lambda_" next_label++ - debug("compile_lambda: args = [" args "]") - # Handle both full lambda expression and just the arguments part if (substr(args, 1, 7) == "(lambda") { debug("compile_lambda: detected full lambda expression") - # Full lambda expression: (lambda (params) body) - args = substr(args, 8) # Remove "(lambda" + args = substr(args, 8) debug("compile_lambda: after removing (lambda: [" args "]") - # Find the closing parenthesis of the entire lambda expression paren_count = 0 i = 1 while (i <= length(args)) { @@ -547,72 +1024,323 @@ function compile_lambda(args, params, body, param_array, nparams, i, lambda_name if (c == "(") paren_count++ if (c == ")") { paren_count-- - if (paren_count == -1) break # Found the closing paren of the lambda + if (paren_count == -1) break } i++ } if (paren_count != -1) error("Unmatched parenthesis in lambda") - args = substr(args, 1, i - 1) # Remove the closing parenthesis + args = substr(args, 1, i - 1) debug("compile_lambda: after removing closing paren: [" args "]") - # Remove leading whitespace sub(/^[ \t\n]+/, "", args) debug("compile_lambda: after removing leading whitespace: [" args "]") } - # Split into parameters and body if (substr(args, 1, 1) != "(") error("Malformed lambda expression") - - # Find matching closing parenthesis for parameters paren_count = 1 i = 2 - # AWK FEATURE: length(string) returns the length of a string - # Unlike JS string.length, this is a function call, not a property while (paren_count > 0 && i <= length(args)) { if (substr(args, i, 1) == "(") paren_count++ if (substr(args, i, 1) == ")") paren_count-- i++ } if (paren_count > 0) error("Unmatched parenthesis in lambda parameters") - - params = substr(args, 2, i - 3) # Remove parentheses + params = substr(args, 2, i - 3) body = substr(args, i) - - # Trim whitespace from body sub(/^[ \t\n]+/, "", body) sub(/[ \t\n]+$/, "", body) - debug("Lambda parameters: " params) debug("Lambda body: " body) - - # Create function label print "LABEL " lambda_name - - # Process parameters - # AWK FEATURE: split(string, array, separator) splits string into array elements - # Unlike JS string.split() which returns an array, this populates an existing array nparams = split(params, param_array, " ") for (i = 1; i <= nparams; i++) { print "STORE " param_array[i] } - - # Compile function body - compile_expr(body) - - # Clean up parameters and return + # --- Use split_sexpressions for robust local define support --- + nsexprs = split_sexpressions(body, sexprs) + if (DEBUG_SEXPR) { + printf("[DEBUG_SEXPR] compile_lambda: processing %d expressions\n", nsexprs) > "/dev/stderr" + } + # Process all leading defines + j = 1 + while (j <= nsexprs && substr(sexprs[j], 1, 7) == "(define") { + expr = sexprs[j] + if (DEBUG_SEXPR) printf("[DEBUG_SEXPR] local define: [%s]\n", expr) > "/dev/stderr" + op = substr(expr, 2, 6) + rest = substr(expr, 8) + compile_define(rest) + j++ + } + # Remaining expressions are the body + first_body = j + last_body = nsexprs + for (i = first_body; i < last_body; i++) { + expr = sexprs[i] + sub(/^[ \t\n]+/, "", expr) + sub(/[ \t\n]+$/, "", expr) + if (DEBUG_SEXPR) printf("[DEBUG_SEXPR] body expr: [%s]\n", expr) > "/dev/stderr" + if (expr != "") compile_expr(expr) + } + # Only the last expression's value is returned + if (last_body >= first_body) { + expr = sexprs[last_body] + sub(/^[ \t\n]+/, "", expr) + sub(/[ \t\n]+$/, "", expr) + if (DEBUG_SEXPR) printf("[DEBUG_SEXPR] return expr: [%s]\n", expr) > "/dev/stderr" + if (expr != "") compile_expr(expr) + } + # --- End local define support --- for (i = nparams; i >= 1; i--) { print "POP_ENV" } print "RETURN" - - # Create closure that captures current environment print "CAPTURE_ENV " lambda_name print "PUSH_CONST CLOSURE:" lambda_name ":ENV_ID" } +# Compile if expression: (if condition then-expr else-expr) +function compile_if(args, split_result, condition, then_expr, else_expr, else_label, end_label) { + debug("Compiling if expression: " args) + + # Split into condition, then-expr, and else-expr + split_result = split_expr(args) + condition = substr(split_result, 1, index(split_result, SUBSEP) - 1) + + # Get the rest and split again for then/else + args = substr(split_result, index(split_result, SUBSEP) + 1) + split_result = split_expr(args) + then_expr = substr(split_result, 1, index(split_result, SUBSEP) - 1) + else_expr = substr(split_result, index(split_result, SUBSEP) + 1) + + debug("If condition: " condition) + debug("If then: " then_expr) + debug("If else: " else_expr) + + # Generate unique labels + else_label = "else_" next_label++ + end_label = "endif_" next_label++ + + # Compile condition + compile_expr(condition) + + # Jump to else if condition is false + print "JUMP_IF_FALSE " else_label + + # Compile then expression + compile_expr(then_expr) + + # Jump to end + print "JUMP " end_label + + # Else label + print "LABEL " else_label + + # Compile else expression + compile_expr(else_expr) + + # End label + print "LABEL " end_label +} + +# Compile cond expression: (cond (test1 expr1) (test2 expr2) ... (else expr)) +function compile_cond(args, test, expr, test_label, end_label) { + debug("Compiling cond expression: " args) + + # Parse the first clause: (test expr) + # Remove outer parentheses + args = substr(args, 2, length(args) - 2) + + # Find the first space after the test + paren_count = 0 + for (i = 1; i <= length(args); i++) { + c = substr(args, i, 1) + if (c == "(") paren_count++ + if (c == ")") paren_count-- + if (c == " " && paren_count == 0) { + test = substr(args, 1, i - 1) + expr = substr(args, i + 1) + # Remove the rest of the cond (other clauses) + if (index(expr, ") (") > 0) { + expr = substr(expr, 1, index(expr, ") (") - 1) + } + break + } + } + + if (!test) { + test = args + expr = "" + } + + debug("Cond test: " test " expr: " expr) + + # Generate labels + test_label = "cond_test_" next_label++ + end_label = "cond_end_" next_label++ + + # Compile test + compile_expr(test) + + # Jump to else if test is false + print "JUMP_IF_FALSE " test_label + + # Compile expression + compile_expr(expr) + + # Jump to end + print "JUMP " end_label + + # Else label + print "LABEL " test_label + + # End label + print "LABEL " end_label +} + +# Compile and expression: (and expr1 expr2 ...) +function compile_and(args, expressions, nexprs, i, expr, short_circuit_label, end_label, split_result, remaining_args) { + debug("Compiling and expression: " args) + + # Parse expressions properly using split_expr + expressions[1] = "" + nexprs = 0 + remaining_args = args + + while (remaining_args != "") { + nexprs++ + split_result = split_expr(remaining_args) + expressions[nexprs] = substr(split_result, 1, index(split_result, SUBSEP) - 1) + remaining_args = substr(split_result, index(split_result, SUBSEP) + 1) + } + + if (nexprs == 0) { + # Empty and returns true + print "PUSH_CONST B:1" + return + } + + if (nexprs == 1) { + # Single expression + compile_expr(expressions[1]) + return + } + + # Generate labels + short_circuit_label = "and_short_" next_label++ + end_label = "and_end_" next_label++ + + for (i = 1; i <= nexprs; i++) { + expr = expressions[i] + debug("And expression " i ": " expr) + + # Compile expression + compile_expr(expr) + + # If not the last expression, check for short-circuit + if (i < nexprs) { + print "JUMP_IF_FALSE " short_circuit_label + } + } + + # Jump to end + print "JUMP " end_label + + # Short-circuit label (result is false) + print "LABEL " short_circuit_label + print "PUSH_CONST B:0" + + # End label + print "LABEL " end_label +} + +# Compile or expression: (or expr1 expr2 ...) +function compile_or(args, expressions, nexprs, i, expr, short_circuit_label, end_label, split_result, remaining_args) { + debug("Compiling or expression: " args) + + # Parse expressions properly using split_expr + expressions[1] = "" + nexprs = 0 + remaining_args = args + + while (remaining_args != "") { + nexprs++ + split_result = split_expr(remaining_args) + expressions[nexprs] = substr(split_result, 1, index(split_result, SUBSEP) - 1) + remaining_args = substr(split_result, index(split_result, SUBSEP) + 1) + } + + if (nexprs == 0) { + # Empty or returns false + print "PUSH_CONST B:0" + return + } + + if (nexprs == 1) { + # Single expression + compile_expr(expressions[1]) + return + } + + # Generate labels + short_circuit_label = "or_short_" next_label++ + end_label = "or_end_" next_label++ + + for (i = 1; i <= nexprs; i++) { + expr = expressions[i] + debug("Or expression " i ": " expr) + + # Compile expression + compile_expr(expr) + + # If not the last expression, check for short-circuit + if (i < nexprs) { + print "JUMP_IF_TRUE " short_circuit_label + } + } + + # Jump to end + print "JUMP " end_label + + # Short-circuit label (result is true) + print "LABEL " short_circuit_label + print "PUSH_CONST B:1" + + # End label + print "LABEL " end_label +} + +# Compile not expression: (not expr) +function compile_not(args, expr) { + debug("Compiling not expression: " args) + + # Compile the expression + compile_expr(args) + + # Negate the result + print "NOT" +} + # Main expression compiler - dispatches based on expression type function compile_expr(expr, split_result, op, args) { debug("Compiling expression: " expr) + # Handle empty expressions + if (expr == "") { + debug("Skipping empty expression") + return + } + + # Handle comment lines + if (expr ~ /^[ \t]*;;/ || expr ~ /^[ \t]*;/) { + debug("Skipping comment line: [" expr "]") + return + } + + # Handle string literals + if (substr(expr, 1, 1) == "\"") { + compile_string(expr) + return + } + # Handle numeric literals if (expr ~ /^-?[0-9]+$/) { compile_number(expr) @@ -625,8 +1353,18 @@ function compile_expr(expr, split_result, op, args) { return } + # Handle boolean literals + if (expr == "#t") { + print "PUSH_CONST B:1" + return + } + if (expr == "#f") { + print "PUSH_CONST B:0" + return + } + # Handle variable lookup - if (expr ~ /^[a-zA-Z_][a-zA-Z0-9_]*$/) { + if (expr ~ /^[a-zA-Z_][a-zA-Z0-9_?-]*$/) { print "LOOKUP " expr return } @@ -644,6 +1382,16 @@ function compile_expr(expr, split_result, op, args) { compile_let(args) } else if (op == "lambda") { compile_lambda(args) + } else if (op == "if") { + compile_if(args) + } else if (op == "cond") { + compile_cond(args) + } else if (op == "and") { + compile_and(args) + } else if (op == "or") { + compile_or(args) + } else if (op == "not") { + compile_not(args) } else { compile_primitive_call(op, args) } @@ -656,5 +1404,59 @@ function compile_expr(expr, split_result, op, args) { # Error reporting helper function error(msg) { print "Error: " msg > "/dev/stderr" + error_flag = 1 exit 1 +} + +# Split a string into top-level S-expressions (returns count, fills sexpr_array) +function split_sexpressions(str, sexpr_array, i, c, in_string, paren_count, current, n, in_comment) { + if (DEBUG_SEXPR) print "[DEBUG_SEXPR] split_sexpressions called" > "/dev/stderr" + in_string = 0 + paren_count = 0 + current = "" + n = 0 + in_comment = 0 + i = 1 + while (i <= length(str)) { + c = substr(str, i, 1) + # Skip whitespace + if (!in_string && (c == " " || c == "\t" || c == "\n")) { i++; continue; } + # Skip comments (start with ; to end of line) + if (!in_string && c == ";") { + while (i <= length(str) && substr(str, i, 1) != "\n") i++; + i++; continue; + } + # Parse S-expression + current = "" + if (c == "(") { + paren_count = 0 + while (i <= length(str)) { + c = substr(str, i, 1) + current = current c + if (c == "\"") in_string = !in_string + if (!in_string && c == "(") paren_count++ + if (!in_string && c == ")") paren_count-- + i++ + if (paren_count == 0 && !in_string) break + } + sexpr_array[++n] = current + } else { + # Parse an atom + while (i <= length(str)) { + c = substr(str, i, 1) + if (!in_string && (c == " " || c == "\t" || c == "\n" || c == "(" || c == ")" || c == ";")) break; + if (c == "\"") in_string = !in_string + current = current c + i++ + } + if (current != "") sexpr_array[++n] = current + } + } + if (DEBUG_SEXPR) { + printf("[DEBUG_SEXPR] split_sexpressions: found %d expressions\n", n) > "/dev/stderr" + for (i = 1; i <= n; i++) { + printf("[DEBUG_SEXPR] %d: [%s]\n", i, sexpr_array[i]) > "/dev/stderr" + } + } + return n } \ No newline at end of file diff --git a/awk/scheme/scheme/bin/repl b/awk/scheme/scheme/bin/repl index 1fce388..1c290d1 100755 --- a/awk/scheme/scheme/bin/repl +++ b/awk/scheme/scheme/bin/repl @@ -89,10 +89,15 @@ evaluate_expression() { # BASH FEATURE: -v var=value passes variables to awk # Unlike JS where you'd use process.env, this sets awk variables directly result=$(awk -v PERSIST=1 -v DEBUG="$DEBUG" -f "$VM" "$ASM_FILE" 2>&1) + vm_exit_code=$? debug "VM output: $result" + debug "VM exit code: $vm_exit_code" if [ -n "$result" ]; then echo "$result" fi + if [ $vm_exit_code -ne 0 ]; then + return $vm_exit_code + fi return 0 else echo "Compilation error" >&2 @@ -109,11 +114,12 @@ if [ "$#" -gt 0 ]; then exit 1 fi debug "Reading file: $1" - file_content=$(cat "$1" | tr '\n' ' ') + file_content=$(cat "$1") debug "File content: $file_content" evaluate_expression "$file_content" + exit_code=$? cleanup "keep_state" # Keep state after file execution - exit 0 + exit $exit_code fi # REPL state diff --git a/awk/scheme/scheme/bin/vm.awk b/awk/scheme/scheme/bin/vm.awk index ea15884..7e68e92 100755 --- a/awk/scheme/scheme/bin/vm.awk +++ b/awk/scheme/scheme/bin/vm.awk @@ -1,58 +1,89 @@ #!/usr/bin/awk -f -# This is a stack-based virtual machine for executing compiled Scheme code -# It implements a simple instruction set with support for: -# - Basic arithmetic operations -# - Function calls and returns -# - Variable bindings and lookups -# - Cons cells and list operations +# Stack-based Virtual Machine for Awk-Scheme +# +# This VM implements a simple but complete execution environment for compiled Scheme code. +# The design prioritizes simplicity and correctness over performance, making it suitable +# for educational purposes and small-scale applications. +# +# Architecture Overview: +# - Stack-based execution model for simplicity and predictable memory usage +# - Typed value system with runtime type checking for safety +# - Environment-based variable binding supporting lexical scoping +# - Closure support for nested function definitions and lexical scoping +# - Persistent state between sessions for REPL continuity +# +# Key Design Decisions: +# - All values are tagged with their type to enable runtime type checking +# - Environment frames are pushed/popped for function calls to support lexical scoping +# - Closures capture their creation environment to support nested functions +# - State persistence uses simple text files for debugging and REPL continuity +# - Function calls execute code directly rather than modifying the program array BEGIN { # Type system tags for runtime type checking - T_NUMBER = "N" # Numbers (integers) - T_BOOLEAN = "B" # Booleans (0/1) - T_SYMBOL = "S" # Symbols (identifiers) - T_PAIR = "P" # Cons cells (pairs) - T_FUNCTION = "F" # Function references - T_NIL = "NIL" # Empty list marker + # These prefixes enable safe value manipulation and clear error messages + T_NUMBER = "N" # Numbers (integers only for simplicity) + T_BOOLEAN = "B" # Booleans (0/1 for compatibility with AWK) + T_SYMBOL = "S" # Symbols (identifiers and variable names) + T_PAIR = "P" # Cons cells (pairs for list construction) + T_FUNCTION = "F" # Function references (for function values) + T_NIL = "NIL" # Empty list marker (distinct from null) T_CLOSURE = "CLOSURE" # Closure objects (function + captured environment) + T_STRING = "STR" # String literals (for text manipulation) + + # Virtual machine registers and state + stack_ptr = 0 # Points to top of evaluation stack (1-indexed) + heap_ptr = 0 # Points to next free heap location for cons cells + pc = 0 # Program counter for instruction fetch and execution - # Virtual machine registers - stack_ptr = 0 # Points to top of evaluation stack - heap_ptr = 0 # Points to next free heap location - pc = 0 # Program counter for instruction fetch - # Original program storage for nested function definitions - # AWK FEATURE: delete array removes all elements from an array - # Unlike JS where you'd set array = [], this clears the array in place + # This preserves the complete program during function calls to support + # nested function definitions and complex control flow delete original_program # Stores the original program before function calls - + + # Debug mode configuration # AWK FEATURE: ENVIRON is a built-in array containing environment variables DEBUG = (ENVIRON["DEBUG"] == "1") ? 1 : 0 # Environment for variable bindings + # This implements lexical scoping by maintaining a stack of variable bindings env_size = 0 # Current size of environment stack - + # Closure support - captured environments for nested lambdas + # Closures capture their creation environment to support lexical scoping + # in nested function definitions delete closure_envs # Maps env_id to captured environment arrays delete closure_env_names # Variable names in captured environments delete closure_env_vals # Variable values in captured environments delete closure_env_sizes # Size of each captured environment next_env_id = 1 # Counter for generating unique environment IDs - + # Function table for storing defined functions + # Functions are stored by name for efficient lookup during execution delete func_def_names # Function names delete func_def_pc # Entry points delete func_def_code # Function bodies func_def_size = 0 # Number of defined functions - + # Call stack for function returns + # Tracks return addresses for proper function call/return semantics call_stack_ptr = 0 + # Enhanced call stack for nested function calls (for map/filter support) + # This enables function calls from within built-in functions + call_stack_size = 0 # Current size of call stack + call_stack_return_pc[100] # Return program counters + call_stack_return_env[100] # Return environment sizes + call_stack_return_stack[100] # Return stack pointers + call_stack_return_func[100] # Return function names (for debugging) + # Global function registry - clear it first + # This maps function names to their implementations for efficient dispatch delete FUNCTIONS # Maps function names to implementations # State persistence configuration + # Uses simple text files for debugging and REPL continuity STATE_FILE = "/tmp/scheme_vm.state" debug("STATE_FILE_PATH: " STATE_FILE) debug("PERSIST_FLAG: " PERSIST) @@ -76,7 +107,7 @@ BEGIN { sub(/ .*$/, "", name) code = line sub(/^[^ ]+ /, "", code) - + # Read the rest of the function code (until next FUNC or end of file) debug("LOADING_STATE: Reading function code for " name) while ((getline line < STATE_FILE) > 0) { @@ -88,10 +119,10 @@ BEGIN { } code = code "\n" line } - + debug("Loaded function: " name) debug("Code: " code) - + # Store function in FUNCTIONS table FUNCTIONS[name] = code debug("LOADED_FUNCTION: " name " with code length " length(code)) @@ -110,15 +141,67 @@ BEGIN { delete func_env_sizes # Size of each function's environment # Register built-in functions first + # These provide the core language operations and are always available + # The registration maps Scheme function names to internal VM function names + # for efficient dispatch during execution debug("REGISTERING_BUILTINS: " length(FUNCTIONS) " functions before") + + # Arithmetic operations - core numeric functionality FUNCTIONS["+"] = "add" FUNCTIONS["-"] = "subtract" FUNCTIONS["*"] = "multiply" FUNCTIONS["/"] = "divide" + FUNCTIONS["modulo"] = "modulo" + FUNCTIONS["%"] = "modulo" # Alias for modulo + FUNCTIONS["expt"] = "expt" + FUNCTIONS["abs"] = "abs" + FUNCTIONS["min"] = "min" + FUNCTIONS["max"] = "max" FUNCTIONS["="] = "equals" FUNCTIONS["<"] = "less_than" FUNCTIONS[">"] = "greater_than" - FUNCTIONS["add1"] = "add_one" + FUNCTIONS["inc"] = "add_one" + FUNCTIONS["++"] = "add_one" # Alias for inc function + + # Output + FUNCTIONS["display"] = "display" + + # String operations - text manipulation utilitie + FUNCTIONS["string-length"] = "string_length" + FUNCTIONS["string-append"] = "string_append" + FUNCTIONS["string-ref"] = "string_ref" + FUNCTIONS["substring"] = "substring" + FUNCTIONS["string=?"] = "string_equal" + FUNCTIONS["string<?"] = "string_less_than" + FUNCTIONS["string>?"] = "string_greater_than" + + # Type predicates - essential for type checking + FUNCTIONS["number?"] = "number_p" + FUNCTIONS["string?"] = "string_p" + FUNCTIONS["boolean?"] = "boolean_p" + FUNCTIONS["symbol?"] = "symbol_p" + FUNCTIONS["zero?"] = "zero_p" + FUNCTIONS["positive?"] = "positive_p" + FUNCTIONS["negative?"] = "negative_p" + + # Standard library - List utilities + # The implementation prioritizes simplicity over performance + FUNCTIONS["list"] = "stdlib_list" + FUNCTIONS["null?"] = "stdlib_null_p" + FUNCTIONS["pair?"] = "stdlib_pair_p" + FUNCTIONS["length"] = "stdlib_length" + FUNCTIONS["append"] = "stdlib_append" + FUNCTIONS["reverse"] = "stdlib_reverse" + FUNCTIONS["member"] = "stdlib_member" + FUNCTIONS["cadr"] = "stdlib_cadr" + FUNCTIONS["caddr"] = "stdlib_caddr" + FUNCTIONS["list-ref"] = "stdlib_list_ref" + FUNCTIONS["list-tail"] = "stdlib_list_tail" + FUNCTIONS["map"] = "stdlib_map" + FUNCTIONS["filter"] = "stdlib_filter" + FUNCTIONS["assert"] = "assert" + FUNCTIONS["error"] = "user_error" + debug("REGISTERING_BUILTINS: " length(FUNCTIONS) " functions after") # Environment persistence configuration @@ -134,9 +217,9 @@ BEGIN { sub(/ .*$/, "", name) val = line sub(/^[^ ]+ /, "", val) - + debug("Loaded env var: " name " = " val) - + # Store in environment env_name[env_size] = name env_val[env_size] = val @@ -187,6 +270,12 @@ function isPair(val) { return getType(val) == T_PAIR } function isFunction(val) { return getType(val) == T_FUNCTION } function isNil(val) { return getType(val) == T_NIL } function isClosure(val) { return getType(val) == T_CLOSURE } +function isString(val) { return getType(val) == T_STRING } + +# String helper functions +function makeString(val) { + return T_STRING ":" val +} # Closure helper functions function makeClosure(func_name, env_id) { @@ -195,7 +284,7 @@ function makeClosure(func_name, env_id) { function getClosureFunction(closure_val) { # Extract function name from CLOSURE:func_name:env_id - return substr(closure_val, index(closure_val, ":") + 1, + return substr(closure_val, index(closure_val, ":") + 1, index(substr(closure_val, index(closure_val, ":") + 1), ":") - 1) } @@ -210,7 +299,7 @@ function getClosureEnvId(closure_val) { function captureEnvironment(env_id, i) { debug("Capturing environment with ID: " env_id) closure_env_sizes[env_id] = env_size - + # Copy current environment to closure environment for (i = 0; i < env_size; i++) { # Only capture non-global variables @@ -220,7 +309,7 @@ function captureEnvironment(env_id, i) { debug("Captured: " env_name[i] " = " env_val[i]) } } - + debug("Captured environment size: " closure_env_sizes[env_id]) } @@ -229,7 +318,7 @@ function vm_capture_env(func_name) { debug("Capturing environment for function: " func_name) env_id = next_env_id++ captureEnvironment(env_id) - + # Replace the placeholder ENV_ID in the closure value # Find the closure value on the stack and update it if (stack_ptr > 0) { @@ -304,7 +393,7 @@ function vm_add() { if (stack_ptr < 2) error("ADD requires two operands") val2 = pop() val1 = pop() - if (!isNumber(val1) || !isNumber(val2)) + if (!isNumber(val1) || !isNumber(val2)) error("ADD requires numeric operands") result = getValue(val1) + getValue(val2) push(makeValue(T_NUMBER, result)) @@ -342,6 +431,61 @@ function vm_divide() { push(makeValue(T_NUMBER, result)) } +function vm_modulo() { + if (stack_ptr < 2) error("MODULO requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) + error("MODULO requires numeric operands") + if (getValue(val2) == 0) + error("Modulo by zero") + a = getValue(val1) + b = getValue(val2) + result = a % b + if (result < 0) result = result + b + push(makeValue(T_NUMBER, result)) +} + +function vm_expt() { + if (stack_ptr < 2) error("EXPT requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) + error("EXPT requires numeric operands") + result = getValue(val1) ^ getValue(val2) + push(makeValue(T_NUMBER, result)) +} + +function vm_abs() { + if (stack_ptr < 1) error("ABS requires one operand") + val = pop() + if (!isNumber(val)) + error("ABS requires numeric operand") + result = getValue(val) + if (result < 0) result = -result + push(makeValue(T_NUMBER, result)) +} + +function vm_min() { + if (stack_ptr < 2) error("MIN requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) + error("MIN requires numeric operands") + result = (getValue(val1) < getValue(val2)) ? getValue(val1) : getValue(val2) + push(makeValue(T_NUMBER, result)) +} + +function vm_max() { + if (stack_ptr < 2) error("MAX requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) + error("MAX requires numeric operands") + result = (getValue(val1) > getValue(val2)) ? getValue(val1) : getValue(val2) + push(makeValue(T_NUMBER, result)) +} + # List operation implementations function vm_cons() { if (stack_ptr < 2) error("CONS requires two operands") @@ -400,10 +544,26 @@ function execute(instr) { split(instr, parts, " ") op = parts[1] debug("Execute: " instr) - + # Dispatch based on instruction opcode if (op == "PUSH_CONST") { - push(parts[2]) + # Reconstruct the full value in case it contains spaces + value = parts[2] + for (i = 3; i <= length(parts); i++) { + value = value " " parts[i] + } + + # Handle escape sequences in string constants + if (value ~ /^STR:/) { + str_content = substr(value, 5) # Remove "STR:" prefix + # Convert escape sequences back to actual characters + gsub(/\\n/, "\n", str_content) + gsub(/\\t/, "\t", str_content) + gsub(/\\\\/, "\\", str_content) + value = "STR:" str_content + } + + push(value) } else if (op == "POP") { pop() @@ -416,7 +576,6 @@ function execute(instr) { if (stack_ptr < 2) error("SWAP requires two operands") val2 = pop() val1 = pop() - val1 = pop() push(val2) push(val1) } @@ -485,6 +644,9 @@ function execute(instr) { else if (op == "CALL") { vm_call_function() } + else if (op == "CALL_WITH_ARGS") { + vm_call_function_with_args(parts[2]) + } else if (op == "RETURN") { debug("EXECUTING_RETURN") # The call_stack_ptr is no longer used for return, so this instruction is effectively removed. @@ -496,6 +658,24 @@ function execute(instr) { else if (op == "CAPTURE_ENV") { vm_capture_env(parts[2]) } + else if (op == "JUMP") { + vm_jump(parts[2]) + } + else if (op == "JUMP_IF_FALSE") { + vm_jump_if_false(parts[2]) + } + else if (op == "JUMP_IF_TRUE") { + vm_jump_if_true(parts[2]) + } + else if (op == "NOT") { + vm_not() + } + else if (op == "POP") { + vm_pop() + } + else if (op == "CLEAR_STACK") { + vm_clear_stack() + } else { error("Unknown instruction: " op) } @@ -506,8 +686,11 @@ function execute(instr) { # NR is a built-in variable that contains the current record (line) number # Unlike JS where you'd need to track line numbers manually { - program[NR-1] = $0 # $0 is the current line being processed - original_program[NR-1] = $0 # Keep a copy of the original program + # Skip empty lines + if (length($0) > 0) { + program[NR-1] = $0 # $0 is the current line being processed + original_program[NR-1] = $0 # Keep a copy of the original program + } } # AWK FEATURE: END block runs after all input has been processed @@ -519,17 +702,79 @@ END { # debug("EXECUTING_PC_" pc ": " program[pc]) execute(program[pc++]) } - + # Save state if we didn't halt normally if (!normal_exit && PERSIST) { save_state() } } +# Control flow instruction implementations +function vm_jump(label) { + debug("Jumping to label: " label) + # Find the label in the program + for (i = 0; i < length(program); i++) { + if (program[i] == "LABEL " label) { + pc = i + return + } + } + error("Label not found: " label) +} + +function vm_jump_if_false(label) { + if (stack_ptr < 1) error("JUMP_IF_FALSE requires one operand") + val = pop() + debug("Jump if false: " val " -> " label) + + # Check if value is false (0, false, or nil) + is_false = (val == "B:0" || val == "N:0" || val == "NIL:") + if (is_false) { + vm_jump(label) + } +} + +function vm_jump_if_true(label) { + if (stack_ptr < 1) error("JUMP_IF_TRUE requires one operand") + val = pop() + debug("Jump if true: " val " -> " label) + + # Check if value is true (non-zero, non-false, non-nil) + is_true = (val != "B:0" && val != "N:0" && val != "NIL:") + if (is_true) { + vm_jump(label) + } +} + +function vm_not() { + if (stack_ptr < 1) error("NOT requires one operand") + val = pop() + debug("NOT: " val) + + # Negate the boolean value + if (val == "B:0" || val == "N:0" || val == "NIL:") { + push(makeValue(T_BOOLEAN, "1")) + } else { + push(makeValue(T_BOOLEAN, "0")) + } +} + +function vm_pop() { + if (stack_ptr < 1) error("POP requires one operand") + val = pop() + debug("POP: " val) + # Value is discarded +} + +function vm_clear_stack() { + debug("Clearing stack (was " stack_ptr " items)") + stack_ptr = 0 +} + # Variable binding implementation function vm_store(name) { debug("Storing " peek() " as " name " at env_size: " env_size) - + # Handle global definitions specially if (lookup_no_error("from_define")) { name = "__global_" name @@ -540,7 +785,7 @@ function vm_store(name) { break } } - + # Remove any previous definition of this global for (i = env_size - 1; i >= 0; i--) { if (env_name[i] == name) { @@ -554,7 +799,7 @@ function vm_store(name) { } } } - + # Handle lambda functions val = peek() if (isSymbol(val)) { @@ -571,12 +816,12 @@ function vm_store(name) { return } } - + # Add to environment env_name[env_size] = name env_val[env_size] = peek() env_size++ - + debug("Environment after store:") dump_env() } @@ -585,13 +830,13 @@ function vm_store(name) { function vm_pop_env() { if (env_size <= 0) error("Environment underflow") debug("Popping environment at size: " env_size) - + # Don't pop globals if (env_name[env_size-1] ~ /^__global_/) { debug("Keeping global definition: " env_name[env_size-1]) return } - + debug("Removing: " env_name[env_size-1] " = " env_val[env_size-1]) env_size-- } @@ -637,7 +882,7 @@ function vm_lookup(name, i, global_name, val) { function vm_define_function(name, start_pc) { debug("Defining function: " name " at " start_pc) debug("call_stack_ptr: " call_stack_ptr) - + # Clear the from_define flag for (i = env_size - 1; i >= 0; i--) { if (env_name[i] == "from_define") { @@ -646,11 +891,11 @@ function vm_define_function(name, start_pc) { break } } - + # Build function code code = "" i = start_pc - + # If we're inside a function call, we need to find the corresponding position # in the original program. For now, use a simple heuristic. if (call_stack_ptr > 0) { @@ -675,13 +920,13 @@ function vm_define_function(name, start_pc) { } } code = code "\nRETURN" - + # Store function debug("Storing function: " name " = " code) FUNCTIONS[name] = code debug("Function stored. Total functions: " length(FUNCTIONS)) debug("FUNCTION_STORED: " name) - + pc = i + 1 } @@ -691,12 +936,13 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na func_name = pop() debug("Calling function: " func_name) debug("CALLING_FUNCTION: " func_name) - + debug("Stack pointer before function call: " stack_ptr) + # If name is a symbol, get its value if (isSymbol(func_name)) { func_name = getValue(func_name) } - + # Check if it's a built-in function first if (func_name in FUNCTIONS) { built_in_name = FUNCTIONS[func_name] @@ -732,9 +978,153 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na debug("Calling built-in function: add_one") add_one() return + } else if (built_in_name == "modulo") { + debug("Calling built-in function: modulo") + modulo() + return + } else if (built_in_name == "expt") { + debug("Calling built-in function: expt") + expt() + return + } else if (built_in_name == "abs") { + debug("Calling built-in function: abs") + abs() + return + } else if (built_in_name == "min") { + debug("Calling built-in function: min") + min() + return + } else if (built_in_name == "max") { + debug("Calling built-in function: max") + max() + return + } else if (built_in_name == "string_length") { + debug("Calling built-in function: string_length") + string_length() + return + } else if (built_in_name == "string_append") { + debug("Calling built-in function: string_append") + string_append() + return + } else if (built_in_name == "string_ref") { + debug("Calling built-in function: string_ref") + string_ref() + return + } else if (built_in_name == "substring") { + debug("Calling built-in function: substring") + substring() + return + } else if (built_in_name == "string_equal") { + debug("Calling built-in function: string_equal") + string_equal() + return + } else if (built_in_name == "string_less_than") { + debug("Calling built-in function: string_less_than") + string_less_than() + return + } else if (built_in_name == "string_greater_than") { + debug("Calling built-in function: string_greater_than") + string_greater_than() + return + } else if (built_in_name == "number_p") { + debug("Calling built-in function: number_p") + number_p() + return + } else if (built_in_name == "string_p") { + debug("Calling built-in function: string_p") + string_p() + return + } else if (built_in_name == "boolean_p") { + debug("Calling built-in function: boolean_p") + boolean_p() + return + } else if (built_in_name == "symbol_p") { + debug("Calling built-in function: symbol_p") + symbol_p() + return + } else if (built_in_name == "zero_p") { + debug("Calling built-in function: zero_p") + zero_p() + return + } else if (built_in_name == "positive_p") { + debug("Calling built-in function: positive_p") + positive_p() + return + } else if (built_in_name == "negative_p") { + debug("Calling built-in function: negative_p") + negative_p() + return + } else if (built_in_name == "stdlib_list") { + debug("Calling standard library function: stdlib_list") + stdlib_list() + return + } else if (built_in_name == "stdlib_null_p") { + debug("Calling standard library function: stdlib_null_p") + stdlib_null_p() + return + } else if (built_in_name == "stdlib_pair_p") { + debug("Calling standard library function: stdlib_pair_p") + stdlib_pair_p() + return + } else if (built_in_name == "stdlib_length") { + debug("Calling standard library function: stdlib_length") + stdlib_length() + return + } else if (built_in_name == "stdlib_append") { + debug("Calling standard library function: stdlib_append") + stdlib_append() + return + } else if (built_in_name == "display") { + debug("Calling built-in function: display") + display() + return + } else if (built_in_name == "stdlib_list") { + debug("Calling standard library function: stdlib_list") + stdlib_list() + return + } else if (built_in_name == "stdlib_reverse") { + debug("Calling standard library function: stdlib_reverse") + stdlib_reverse() + return + } else if (built_in_name == "stdlib_member") { + debug("Calling standard library function: stdlib_member") + stdlib_member() + return + } else if (built_in_name == "stdlib_cadr") { + debug("Calling standard library function: stdlib_cadr") + stdlib_cadr() + return + } else if (built_in_name == "stdlib_caddr") { + debug("Calling standard library function: stdlib_caddr") + stdlib_caddr() + return + } else if (built_in_name == "stdlib_list_ref") { + debug("Calling standard library function: stdlib_list_ref") + stdlib_list_ref() + return + } else if (built_in_name == "stdlib_list_tail") { + debug("Calling standard library function: stdlib_list_tail") + stdlib_list_tail() + return + } else if (built_in_name == "stdlib_map") { + debug("Calling standard library function: stdlib_map") + stdlib_map() + return + } else if (built_in_name == "stdlib_filter") { + debug("Calling standard library function: stdlib_filter") + stdlib_filter() + return + } else if (built_in_name == "assert") { + debug("Calling built-in function: assert") + assert() + return + } else if (built_in_name == "user_error") { + debug("Calling built-in function: user_error") + user_error() + return } } - + # Handle anonymous functions and closures if (func_name ~ /^__lambda_/) { if (!(func_name in FUNCTIONS)) { @@ -745,27 +1135,27 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na debug("Calling closure: " func_name) closure_func = getClosureFunction(func_name) closure_env_id = getClosureEnvId(func_name) - + if (!(closure_func in FUNCTIONS)) { error("Undefined closure function: " closure_func) } - + # Restore the captured environment pushClosureEnvironment(closure_env_id) - + # Use the closure's function name for the rest of the call func_name = closure_func } else if (!(func_name in FUNCTIONS)) { error("Undefined function: " func_name) } - + saved_pc = pc saved_env_size = env_size - + # AWK FEATURE: split(string, array, separator) splits string into array elements # Unlike JS string.split() which returns an array, this populates an existing array split(FUNCTIONS[func_name], code_lines, "\n") - + # Check if this is a parameterized function if (code_lines[1] ~ /^STORE /) { # This is a parameterized function (lambda) @@ -773,19 +1163,19 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na param_name = substr(code_lines[1], 7) debug("Found parameter name: " param_name) debug("FUNCTION_PARAM: " param_name) - + # Get argument from stack arg = pop() debug("Function argument: " arg) debug("FUNCTION_ARG: " arg) - + # Create new environment frame debug("Creating new environment frame at size: " env_size) env_name[env_size] = param_name env_val[env_size] = arg env_size++ debug("FUNCTION_ENV_STORE: " param_name " = " arg " at index " (env_size-1)) - + # Execute function code directly, skipping STORE and POP_ENV instructions for (j = 2; j <= length(code_lines); j++) { if (code_lines[j] != "" && code_lines[j] != "POP_ENV") { @@ -793,10 +1183,10 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na execute(code_lines[j]) } } - + # Clean up parameter vm_pop_env() - + # Return to caller debug("Function completed, returning to PC: " saved_pc) pc = saved_pc @@ -804,7 +1194,7 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na } else { # This is a built-in function or non-parameterized function debug("Calling non-parameterized function: " func_name) - + # Execute all function code directly for (j in code_lines) { if (code_lines[j] != "") { @@ -812,7 +1202,7 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na execute(code_lines[j]) } } - + # Return to caller debug("Function completed, returning to PC: " saved_pc) pc = saved_pc @@ -820,6 +1210,38 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na } } +# Function call with argument count implementation +function vm_call_function_with_args(arg_count, code_lines, j, saved_pc, saved_env_size, arg, param_name) { + # Get function name from stack + func_name = pop() + debug("Calling function with args: " func_name " (arg_count: " arg_count ")") + debug("CALLING_FUNCTION_WITH_ARGS: " func_name " (arg_count: " arg_count ")") + debug("Stack pointer before function call: " stack_ptr) + + # If name is a symbol, get its value + if (isSymbol(func_name)) { + func_name = getValue(func_name) + } + + # Check if it's a built-in function first + if (func_name in FUNCTIONS) { + built_in_name = FUNCTIONS[func_name] + if (built_in_name == "stdlib_list") { + debug("Calling standard library function with args: stdlib_list (arg_count: " arg_count ")") + stdlib_list_with_args(arg_count) + return + } else if (built_in_name == "string_append") { + debug("Calling built-in function with args: string_append (arg_count: " arg_count ")") + string_append_with_args(arg_count) + return + } + } + + # For other functions, fall back to regular call + debug("Falling back to regular function call for: " func_name) + vm_call_function() +} + # Function return implementation - no longer needed with direct execution # function vm_return() { # debug("VM_RETURN: call_stack_ptr = " call_stack_ptr) @@ -827,25 +1249,25 @@ function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_na # # Save return value # ret_val = pop() # debug("VM_RETURN: return value = " ret_val) -# +# # # Restore environment # while (env_size > env_stack[call_stack_ptr]) { # debug("Popping environment at size: " env_size) # vm_pop_env() # } -# +# # # Restore program counter # pc = call_stack[call_stack_ptr--] # debug("VM_RETURN: restored PC = " pc) -# +# # # Restore the original program at the call position # program[pc] = original_program_at_call[call_stack_ptr + 1] # debug("Restored original program: " original_program_at_call[call_stack_ptr + 1]) -# +# # # Push return value # push(ret_val) # debug("VM_RETURN: pushed return value " ret_val) -# +# # debug("Returned with value: " ret_val " and env_size: " env_size) # } # } @@ -969,13 +1391,213 @@ function divide() { } function add_one() { - if (stack_ptr < 1) error("add1 requires one operand") + if (stack_ptr < 1) error("inc requires one operand") val = pop() - if (!isNumber(val)) error("add1 requires numeric operand") + if (!isNumber(val)) error("inc requires numeric operand") result = getValue(val) + 1 push(makeValue(T_NUMBER, result)) } +function modulo() { + if (stack_ptr < 2) error("modulo requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) error("modulo requires numeric operands") + if (getValue(val2) == 0) error("Modulo by zero") + a = getValue(val1) + b = getValue(val2) + result = a % b + if (result < 0) result = result + b + push(makeValue(T_NUMBER, result)) +} + +function expt() { + if (stack_ptr < 2) error("expt requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) error("expt requires numeric operands") + result = getValue(val1) ^ getValue(val2) + push(makeValue(T_NUMBER, result)) +} + +function abs() { + if (stack_ptr < 1) error("abs requires one operand") + val = pop() + if (!isNumber(val)) error("abs requires numeric operand") + result = getValue(val) + if (result < 0) result = -result + push(makeValue(T_NUMBER, result)) +} + +function min() { + if (stack_ptr < 2) error("min requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) error("min requires numeric operands") + result = (getValue(val1) < getValue(val2)) ? getValue(val1) : getValue(val2) + push(makeValue(T_NUMBER, result)) +} + +function max() { + if (stack_ptr < 2) error("max requires two operands") + val2 = pop() + val1 = pop() + if (!isNumber(val1) || !isNumber(val2)) error("max requires numeric operands") + result = (getValue(val1) > getValue(val2)) ? getValue(val1) : getValue(val2) + push(makeValue(T_NUMBER, result)) +} + +# String operations +function string_length() { + if (stack_ptr < 1) error("string-length requires one operand") + val = pop() + if (!isString(val)) error("string-length requires string operand") + str = getValue(val) + push(makeValue(T_NUMBER, length(str))) +} + +function string_append() { + if (stack_ptr < 2) error("string-append requires at least two operands") + result = "" + # Pop all arguments and concatenate (in reverse order) + while (stack_ptr > 0) { + val = pop() + if (!isString(val)) error("string-append requires string operands") + result = getValue(val) result # Prepend to maintain order + } + push(makeString(result)) +} + +function string_append_with_args(arg_count) { + if (arg_count < 2) error("string-append requires at least two operands") + if (stack_ptr < arg_count) error("string-append requires " arg_count " arguments, but only " stack_ptr " available") + + result = "" + # Pop specified number of arguments and concatenate (in reverse order) + for (i = arg_count; i >= 1; i--) { + val = pop() + if (!isString(val)) error("string-append requires string operands") + result = getValue(val) result # Prepend to maintain order + } + push(makeString(result)) +} + +function string_ref() { + if (stack_ptr < 2) error("string-ref requires two operands") + index_val = pop() + str_val = pop() + if (!isNumber(index_val)) error("string-ref index must be a number") + if (!isString(str_val)) error("string-ref requires string operand") + + str = getValue(str_val) + idx = getValue(index_val) + 1 # Convert to 1-indexed + if (idx < 1 || idx > length(str)) error("string-ref index out of bounds") + + char = substr(str, idx, 1) + push(makeString(char)) +} + +function substring() { + if (stack_ptr < 3) error("substring requires three operands") + end = pop() + start = pop() + str_val = pop() + + if (!isNumber(start) || !isNumber(end)) error("substring indices must be numbers") + if (!isString(str_val)) error("substring requires string operand") + + str = getValue(str_val) + start_idx = getValue(start) + 1 # Convert to 1-indexed + end_idx = getValue(end) + 1 + + if (start_idx < 1 || end_idx > length(str) || start_idx > end_idx) { + error("substring indices out of bounds") + } + + result = substr(str, start_idx, end_idx - start_idx + 1) + push(makeString(result)) +} + +function string_equal() { + if (stack_ptr < 2) error("string=? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string=? requires string operands") + result = (getValue(val1) == getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_less_than() { + if (stack_ptr < 2) error("string<? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string<? requires string operands") + result = (getValue(val1) < getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_greater_than() { + if (stack_ptr < 2) error("string>? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string>? requires string operands") + result = (getValue(val1) > getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +# Type predicates - essential for type checking +function number_p() { + if (stack_ptr < 1) error("number? requires one operand") + val = pop() + result = isNumber(val) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_p() { + if (stack_ptr < 1) error("string? requires one operand") + val = pop() + result = isString(val) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function boolean_p() { + if (stack_ptr < 1) error("boolean? requires one operand") + val = pop() + result = isBoolean(val) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function symbol_p() { + if (stack_ptr < 1) error("symbol? requires one operand") + val = pop() + result = isSymbol(val) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function zero_p() { + if (stack_ptr < 1) error("zero? requires one operand") + val = pop() + if (!isNumber(val)) error("zero? requires numeric operand") + result = (getValue(val) == 0) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function positive_p() { + if (stack_ptr < 1) error("positive? requires one operand") + val = pop() + if (!isNumber(val)) error("positive? requires numeric operand") + result = (getValue(val) > 0) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function negative_p() { + if (stack_ptr < 1) error("negative? requires one operand") + val = pop() + if (!isNumber(val)) error("negative? requires numeric operand") + result = (getValue(val) < 0) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + # Get value from top of stack function vm_get_value() { val = pop() @@ -994,4 +1616,888 @@ function vm_get_value() { # For other types, just push the value back push(val) } +} + +# Print a value to stdout (handles numbers, strings, booleans, and lists) +function display() { + if (stack_ptr < 1) error("display requires one argument") + val = pop() + print display_value(val) +} + +# Assert function for testing - checks if condition is true +function assert() { + if (stack_ptr < 1) error("assert requires one argument") + val = pop() + if (!isBoolean(val) && !isNumber(val)) { + error("assert requires boolean or number argument") + } + if (getValue(val) == "0" || getValue(val) == "B:0") { + error("Assertion failed") + } +} + +# User-callable error function +function user_error() { + if (stack_ptr < 1) error("error requires one argument") + val = pop() + if (!isString(val)) { + error("error requires string argument") + } + error(getValue(val)) +} + +# Recursively convert a value to a string for display +function display_value(val, t, idx, pair, car_val, cdr_val, result) { + t = getType(val) + if (t == T_NUMBER) return getValue(val) + if (t == T_BOOLEAN) return (getValue(val) == "1" ? "#t" : "#f") + if (t == T_STRING) return getValue(val) + if (t == T_SYMBOL) return getValue(val) + if (t == T_NIL) return "()" + if (t == T_PAIR) { + result = "(" + idx = getValue(val) + pair = getHeap(idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + cdr_val = substr(pair, index(pair, ",") + 1) + result = result display_value(car_val) + while (cdr_val != "NIL:" && isPair(cdr_val)) { + idx = getValue(cdr_val) + pair = getHeap(idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + cdr_val = substr(pair, index(pair, ",") + 1) + result = result " " display_value(car_val) + } + if (cdr_val != "NIL:") { + result = result " . " display_value(cdr_val) + } + result = result ")" + return result + } + if (t == T_CLOSURE) return "#<closure>" + if (t == T_FUNCTION) return "#<function>" + return "#<unknown>" +} + +# Standard Library Functions +# These implement essential Scheme list utilities following standard conventions +# Each function prioritizes correctness and clear error messages over performance +# The implementation uses the VM's heap for cons cell allocation and management + +# Create a list from elements +# This function handles variable argument counts by building the list from the stack +# The implementation reverses the stack order to maintain proper list construction +function stdlib_list() { + debug("stdlib_list called with stack_ptr: " stack_ptr) + debug("Stack contents before list: " stack_ptr " items") + # This will be a special form handled by the compiler + # For now, we'll implement it as a function that takes arguments from stack + if (stack_ptr < 1) { + # No arguments, return empty list + debug("No arguments, returning NIL:") + push("NIL:") + return + } + + # Build list from stack elements (arguments are in reverse order on stack) + result = "NIL:" + nargs = stack_ptr + debug("Building list with " nargs " arguments") + + # Pop arguments and build list in correct order + for (i = nargs; i >= 1; i--) { + val = pop() + debug("Popped argument " i ": " val) + # Create cons cell with current value and result + pair_val = val "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + debug("Created pair: " result) + } + + debug("Final result: " result) + debug("Stack contents after list: " stack_ptr " items") + push(result) +} + +# Create a list from elements with explicit argument count +# This function only consumes the specified number of arguments +function stdlib_list_with_args(arg_count) { + debug("stdlib_list_with_args called with arg_count: " arg_count) + debug("Stack contents before list: " stack_ptr " items") + + if (arg_count < 0) { + error("Invalid argument count for list: " arg_count) + } + + if (arg_count == 0) { + # No arguments, return empty list + debug("No arguments, returning NIL:") + push("NIL:") + return + } + + if (stack_ptr < arg_count) { + error("list requires " arg_count " arguments, but only " stack_ptr " available") + } + + # Build list from stack elements (arguments are in reverse order on stack) + result = "NIL:" + debug("Building list with " arg_count " arguments") + + # Pop arguments and build list in correct order + for (i = arg_count; i >= 1; i--) { + val = pop() + debug("Popped argument " i ": " val) + # Create cons cell with current value and result + pair_val = val "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + debug("Created pair: " result) + } + + debug("Final result: " result) + debug("Stack contents after list: " stack_ptr " items") + push(result) +} + +# Check if value is null (empty list) +# This predicate is essential for list processing and control flow +function stdlib_null_p() { + if (stack_ptr < 1) error("null? requires one argument") + val = pop() + result = (val == "NIL:") ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +# Check if value is a pair (cons cell) +# This predicate enables safe list manipulation by checking types +function stdlib_pair_p() { + if (stack_ptr < 1) error("pair? requires one argument") + val = pop() + result = isPair(val) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +# Get length of a list +# This function traverses the list structure to count elements +# It provides clear error messages for non-list arguments +function stdlib_length() { + if (stack_ptr < 1) error("length requires one argument") + val = pop() + + if (val == "NIL:") { + push(makeValue(T_NUMBER, "0")) + return + } + + if (!isPair(val)) { + error("length requires a list argument") + } + + count = 0 + current = val + + while (current != "NIL:" && isPair(current)) { + count++ + # Get cdr of current pair + pair_idx = getValue(current) + pair = getHeap(pair_idx) + cdr_val = substr(pair, index(pair, ",") + 1) + current = cdr_val + } + + push(makeValue(T_NUMBER, count)) +} + +# Append two lists +# This function creates a new list by copying the first list and +# replacing its final NIL: with the second list +function stdlib_append() { + if (stack_ptr < 2) error("append requires two arguments") + list2 = pop() + list1 = pop() + + if (list1 == "NIL:") { + push(list2) + return + } + + if (!isPair(list1)) { + error("append requires list arguments") + } + + # Copy first list + result = copy_list(list1) + + # Find end of first list + current = result + while (current != "NIL:" && isPair(current)) { + pair_idx = getValue(current) + pair = getHeap(pair_idx) + cdr_val = substr(pair, index(pair, ",") + 1) + + if (cdr_val == "NIL:") { + # Replace NIL: with second list + new_pair_val = substr(pair, 1, index(pair, ",")) list2 + heap[pair_idx] = new_pair_val + break + } + current = cdr_val + } + + push(result) +} + +# Get second element of list (car of cdr) +# This function implements the standard Scheme cadr operation +# It provides clear error messages for lists with insufficient elements +function stdlib_cadr() { + if (stack_ptr < 1) error("cadr requires one argument") + val = pop() + + if (val == "NIL:" || !isPair(val)) { + error("cadr requires a list with at least 2 elements") + } + + # Get cdr + pair_idx = getValue(val) + pair = getHeap(pair_idx) + cdr_val = substr(pair, index(pair, ",") + 1) + + if (cdr_val == "NIL:" || !isPair(cdr_val)) { + error("cadr requires a list with at least 2 elements") + } + + # Get car of cdr + cdr_pair_idx = getValue(cdr_val) + cdr_pair = getHeap(cdr_pair_idx) + cadr_val = substr(cdr_pair, 1, index(cdr_pair, ",") - 1) + + push(cadr_val) +} + +# Get third element of list (car of cdr of cdr) +# This function implements the standard Scheme caddr operation +# It provides clear error messages for lists with insufficient elements +function stdlib_caddr() { + if (stack_ptr < 1) error("caddr requires one argument") + val = pop() + + if (val == "NIL:" || !isPair(val)) { + error("caddr requires a list with at least 3 elements") + } + + # Get cdr of cdr + pair_idx = getValue(val) + pair = getHeap(pair_idx) + cdr_val = substr(pair, index(pair, ",") + 1) + + if (cdr_val == "NIL:" || !isPair(cdr_val)) { + error("caddr requires a list with at least 3 elements") + } + + cdr_pair_idx = getValue(cdr_val) + cdr_pair = getHeap(cdr_pair_idx) + cdot_val = substr(cdr_pair, index(cdr_pair, ",") + 1) + + if (cdot_val == "NIL:" || !isPair(cdot_val)) { + error("caddr requires a list with at least 3 elements") + } + + # Get car of cdr of cdr + cdot_pair_idx = getValue(cdot_val) + cdot_pair = getHeap(cdot_pair_idx) + caddr_val = substr(cdot_pair, 1, index(cdot_pair, ",") - 1) + + push(caddr_val) +} + +# Reverse a list +# This function creates a new list with elements in reverse order +# It traverses the original list and builds the result using cons +function stdlib_reverse() { + if (stack_ptr < 1) error("reverse requires one argument") + list_val = pop() + + if (list_val == "NIL:") { + push("NIL:") + return + } + + if (!isPair(list_val)) { + error("reverse requires a list argument") + } + + # Build reversed list + result = "NIL:" + current = list_val + + while (current != "NIL:" && isPair(current)) { + # Get car of current pair + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + + # Add to result + pair_val = car_val "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + + # Move to next element + current = substr(pair, index(pair, ",") + 1) + } + + push(result) +} + +# Check if element is member of list +# This function returns the sublist starting from the matching element +# or NIL: if the element is not found +function stdlib_member() { + debug("stdlib_member called with stack_ptr: " stack_ptr) + if (stack_ptr < 2) error("member requires two arguments") + list_val = pop() + elem_val = pop() + debug("stdlib_member: elem_val=" elem_val " list_val=" list_val) + + if (list_val == "NIL:") { + push("NIL:") + return + } + + if (!isPair(list_val)) { + error("member requires a list as second argument") + } + + current = list_val + + while (current != "NIL:" && isPair(current)) { + # Get car of current pair + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + + # Check if element matches + if (car_val == elem_val) { + push(current) # Return the sublist starting from this element + return + } + + # Move to next element + current = substr(pair, index(pair, ",") + 1) + } + + # Element not found + push("NIL:") +} + +# Get element at index in list +function stdlib_list_ref() { + if (stack_ptr < 2) error("list-ref requires two arguments") + index_val = pop() + list_val = pop() + + if (!isNumber(index_val)) { + error("list-ref index must be a number") + } + + idx = getValue(index_val) + + if (list_val == "NIL:") { + error("list-ref index out of bounds") + } + + if (!isPair(list_val)) { + error("list-ref requires a list argument") + } + + current = list_val + count = 0 + + while (current != "NIL:" && isPair(current)) { + if (count == idx) { + # Get car of current pair + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + push(car_val) + return + } + + # Move to next element + pair_idx = getValue(current) + pair = getHeap(pair_idx) + current = substr(pair, index(pair, ",") + 1) + count++ + } + + error("list-ref index out of bounds") +} + +# Get list from index onwards +function stdlib_list_tail() { + if (stack_ptr < 2) error("list-tail requires two arguments") + index_val = pop() + list_val = pop() + + if (!isNumber(index_val)) { + error("list-tail index must be a number") + } + + idx = getValue(index_val) + + if (idx == 0) { + push(list_val) + return + } + + if (list_val == "NIL:") { + error("list-tail index out of bounds") + } + + if (!isPair(list_val)) { + error("list-tail requires a list argument") + } + + current = list_val + count = 0 + + while (current != "NIL:" && isPair(current)) { + if (count == idx) { + push(current) + return + } + + # Move to next element + pair_idx = getValue(current) + pair = getHeap(pair_idx) + current = substr(pair, index(pair, ",") + 1) + count++ + } + + if (count == idx) { + push("NIL:") + return + } + + error("list-tail index out of bounds") +} + +# Helper function to copy a list +function copy_list(list_val, result, current, pair_idx, pair, car_val, cdr_val) { + if (list_val == "NIL:") { + return "NIL:" + } + + if (!isPair(list_val)) { + error("copy_list requires a list argument") + } + + result = "NIL:" + current = list_val + + # Build result list in reverse order + temp_stack[0] = 0 + while (current != "NIL:" && isPair(current)) { + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + + temp_stack[++temp_stack[0]] = car_val + + cdr_val = substr(pair, index(pair, ",") + 1) + current = cdr_val + } + + # Build result in correct order + for (i = temp_stack[0]; i >= 1; i--) { + if (result == "NIL:") { + result = temp_stack[i] + } else { + pair_val = temp_stack[i] "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + } + } + + return result +} + +# Apply function to each element of a list +function stdlib_map() { + if (stack_ptr < 2) error("map requires two arguments") + list_val = pop() + func_val = pop() + + debug("Map called with function: " func_val " and list: " list_val) + + if (!isClosure(func_val) && !isSymbol(func_val)) { + error("map requires a function as first argument") + } + + if (list_val == "NIL:") { + push("NIL:") + return + } + + if (!isPair(list_val)) { + error("map requires a list as second argument") + } + + result = "NIL:" + current = list_val + temp_stack[0] = 0 + + # Process each element and collect results + while (current != "NIL:" && isPair(current)) { + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + + debug("Map processing element: " car_val) + + # Use new function call context + call_function_context(func_val, car_val) + + # Get the result + mapped_val = pop() + temp_stack[++temp_stack[0]] = mapped_val + + debug("Map collected result: " mapped_val) + + cdr_val = substr(pair, index(pair, ",") + 1) + current = cdr_val + } + + # Build result list in correct order + for (i = temp_stack[0]; i >= 1; i--) { + if (result == "NIL:") { + result = temp_stack[i] + } else { + pair_val = temp_stack[i] "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + } + } + + debug("Map returning result: " result) + push(result) +} + +# Filter list elements based on predicate function +function stdlib_filter() { + if (stack_ptr < 2) error("filter requires two arguments") + list_val = pop() + pred_val = pop() + + debug("Filter called with predicate: " pred_val " and list: " list_val) + + if (!isClosure(pred_val) && !isSymbol(pred_val)) { + error("filter requires a function as first argument") + } + + if (list_val == "NIL:") { + push("NIL:") + return + } + + if (!isPair(list_val)) { + error("filter requires a list as second argument") + } + + result = "NIL:" + current = list_val + temp_stack[0] = 0 + + # Process each element and collect matching results + while (current != "NIL:" && isPair(current)) { + pair_idx = getValue(current) + pair = getHeap(pair_idx) + car_val = substr(pair, 1, index(pair, ",") - 1) + + debug("Filter processing element: " car_val) + + # Use new function call context + call_function_context(pred_val, car_val) + + # Get the predicate result + pred_result = pop() + debug("Filter predicate result: " pred_result) + if (is_truthy(pred_result)) { + temp_stack[++temp_stack[0]] = car_val + debug("Filter keeping element: " car_val) + } + + cdr_val = substr(pair, index(pair, ",") + 1) + current = cdr_val + } + + # Build result list in correct order + for (i = temp_stack[0]; i >= 1; i--) { + if (result == "NIL:") { + result = temp_stack[i] + } else { + pair_val = temp_stack[i] "," result + pair_idx = allocate(pair_val) + result = makeValue(T_PAIR, pair_idx) + } + } + + debug("Filter returning result: " result) + push(result) +} + +# Helper function to call a function (used by map and filter) +function call_function() { + func_val = pop() + arg = pop() + + if (isSymbol(func_val)) { + func_name = getValue(func_val) + if (func_name in FUNCTIONS) { + # Built-in function + push(arg) + built_in_name = FUNCTIONS[func_name] + if (built_in_name == "add") { + add() + } else if (built_in_name == "subtract") { + subtract() + } else if (built_in_name == "multiply") { + multiply() + } else if (built_in_name == "divide") { + divide() + } else if (built_in_name == "modulo") { + modulo() + } else if (built_in_name == "expt") { + expt() + } else if (built_in_name == "abs") { + abs() + } else if (built_in_name == "min") { + min() + } else if (built_in_name == "max") { + max() + } else if (built_in_name == "add_one") { + add_one() + } else { + error("Unsupported built-in function in map: " func_name) + } + } else { + # User-defined function - simplified for now + error("User-defined functions not yet supported in map") + } + } else if (isClosure(func_val)) { + # Lambda function - simplified for now + error("Lambda functions not yet supported in map") + } else { + error("Invalid function type in map") + } +} + +# Context management functions for nested function calls +function save_call_context(func_name) { + if (call_stack_size >= 100) { + error("Call stack overflow - too many nested function calls") + } + + call_stack_return_pc[call_stack_size] = pc + call_stack_return_env[call_stack_size] = env_size + call_stack_return_stack[call_stack_size] = stack_ptr + call_stack_return_func[call_stack_size] = func_name + + call_stack_size++ + debug("Saved call context for " func_name " - stack size: " call_stack_size) +} + +function restore_call_context() { + if (call_stack_size <= 0) { + error("Call stack underflow - trying to restore with empty stack") + } + + call_stack_size-- + pc = call_stack_return_pc[call_stack_size] + env_size = call_stack_return_env[call_stack_size] + # Don't restore stack_ptr - the nested call should leave its result on top + # stack_ptr = call_stack_return_stack[call_stack_size] + + debug("Restored call context - stack size: " call_stack_size " (stack_ptr: " stack_ptr ")") +} + +function call_function_context(func_val, arg) { + debug("Calling function in context: " func_val " with arg: " arg) + + # Save current context + save_call_context("nested_call") + + # Push argument and function + push(arg) + push(func_val) + + # Execute function call + execute_nested_function_call() + + # Restore context + restore_call_context() +} + +function execute_nested_function_call() { + func_val = pop() + arg = pop() + + debug("Executing nested function call: " func_val " with arg: " arg) + + if (isSymbol(func_val)) { + func_name = getValue(func_val) + debug("Function name from symbol: " func_name) + + # Handle lambda functions (__lambda_*) + if (func_name ~ /^__lambda_/) { + if (!(func_name in FUNCTIONS)) { + error("Undefined lambda function: " func_name) + } + call_user_function_nested(func_name, arg) + } else if (func_name in FUNCTIONS) { + # Built-in function + push(arg) + built_in_name = FUNCTIONS[func_name] + debug("Built-in function name: " built_in_name) + call_builtin_function_nested(built_in_name) + } else { + # User-defined function + call_user_function_nested(func_name, arg) + } + } else if (isClosure(func_val)) { + # Lambda function with captured environment + debug("Calling closure in nested context: " func_val) + call_closure_nested(func_val, arg) + } else { + error("Invalid function type in nested call: " func_val) + } +} + +# Helper function to check if a value is truthy +function is_truthy(val) { + if (val == "B:0" || val == "#f") { + return 0 + } + return 1 +} + +function call_builtin_function_nested(built_in_name) { + debug("Calling built-in function in nested context: " built_in_name) + + if (built_in_name == "add") { + add() + } else if (built_in_name == "subtract") { + subtract() + } else if (built_in_name == "multiply") { + multiply() + } else if (built_in_name == "divide") { + divide() + } else if (built_in_name == "modulo") { + modulo() + } else if (built_in_name == "expt") { + expt() + } else if (built_in_name == "abs") { + abs() + } else if (built_in_name == "min") { + min() + } else if (built_in_name == "max") { + max() + } else if (built_in_name == "add_one") { + add_one() + } else if (built_in_name == "number_p") { + number_p() + } else if (built_in_name == "string_p") { + string_p() + } else if (built_in_name == "boolean_p") { + boolean_p() + } else if (built_in_name == "symbol_p") { + symbol_p() + } else if (built_in_name == "zero_p") { + zero_p() + } else if (built_in_name == "positive_p") { + positive_p() + } else if (built_in_name == "negative_p") { + negative_p() + } else { + error("Unsupported built-in function in nested call: " built_in_name) + } +} + +function call_user_function_nested(func_name, arg) { + debug("Calling user function in nested context: " func_name " with arg: " arg) + + if (!(func_name in FUNCTIONS)) { + error("Undefined user function: " func_name) + } + + # Get function code + split(FUNCTIONS[func_name], code_lines, "\n") + + # Check if this is a parameterized function + if (code_lines[1] ~ /^STORE /) { + # This is a parameterized function (lambda) + param_name = substr(code_lines[1], 7) + debug("Found parameter name: " param_name) + + # Create new environment frame + debug("Creating new environment frame at size: " env_size) + env_name[env_size] = param_name + env_val[env_size] = arg + env_size++ + debug("FUNCTION_ENV_STORE: " param_name " = " arg " at index " (env_size-1)) + + # Execute function code directly, skipping STORE and POP_ENV instructions + for (j = 2; j <= length(code_lines); j++) { + if (code_lines[j] != "" && code_lines[j] != "POP_ENV") { + debug("Executing function instruction: " code_lines[j]) + execute(code_lines[j]) + } + } + + # Clean up parameter + vm_pop_env() + } else { + # This is a non-parameterized function + debug("Calling non-parameterized function: " func_name) + + # Push argument for non-parameterized functions + push(arg) + + # Execute all function code directly + for (j in code_lines) { + if (code_lines[j] != "") { + debug("Executing function instruction: " code_lines[j]) + execute(code_lines[j]) + } + } + } +} + +function call_closure_nested(closure_val, arg) { + debug("Calling closure in nested context: " closure_val " with arg: " arg) + + # Extract closure information + closure_func = getClosureFunction(closure_val) + closure_env_id = getClosureEnvId(closure_val) + + debug("Closure function: " closure_func " env_id: " closure_env_id) + + if (!(closure_func in FUNCTIONS)) { + error("Undefined closure function: " closure_func) + } + + # Save current environment state + saved_env_size = env_size + + # Restore the captured environment + pushClosureEnvironment(closure_env_id) + + # Now call the user function with the restored environment + call_user_function_nested(closure_func, arg) + + # Restore original environment (closure environment is automatically cleaned up) + # Note: We don't need to explicitly restore since the nested call context handles this } \ No newline at end of file diff --git a/awk/scheme/scheme/diagram.md b/awk/scheme/scheme/diagram.md index 689a44a..cbd39e2 100644 --- a/awk/scheme/scheme/diagram.md +++ b/awk/scheme/scheme/diagram.md @@ -1,68 +1,41 @@ # Awk-Scheme Architecture -## Component Interaction Diagram + +## Component Overview ``` -+----------------+ Scheme Code +----------------+ Assembly +----------------+ ++----------------+ Scheme Code +----------------+ Bytecode +----------------+ | | -----------------> | | -------------> | | | REPL | "(+ 1 2)" | Compiler | "PUSH_CONST | VM | | (bin/repl) | "(lambda (x) ...)" | compiler.awk | N:1 | vm.awk | | | | | PUSH_CONST | | -| - Multi-line | | - Lexer | N:2 | - Stack-based | +| - Interactive | | - Lexer | N:2 | - Stack-based | | - Debug mode | | - Parser | ADD | - Type system | | - File input | | - Code gen | HALT" | - Environment | -| | | - Closure gen | CAPTURE_ENV | - Direct exec | +| | | - Closure gen | | - Direct exec | | | <----------------- | | <------------- | | | | Output: "N:3" | | Result | | +----------------+ +----------------+ +----------------+ - ^ | - | v - | +----------------+ - | | Persistence | - | | /tmp files: | - +--------------------------------------------------------------+ - vm.state | - | - vm.env | - +----------------+ - -Debug Flow (when DEBUG=1): -+----------------+ Debug Info +----------------+ Debug Info +----------------+ -| REPL | -----------------> | Compiler | -------------> | VM | -| | [Input received] | | [Tokens found] | | -| [Debug output] | | [Parsing tree] | [Stack ops] | [Stack state] | -| stderr | <----------------- | [Gen code] | [Closure info] | [Environment] | -+----------------+ +----------------+ +----------------+ - -Execution Flow Example: -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Input: │ │ Lexer/ │ │ Code Gen │ │ VM │ -│ (+ 1 2) │ ------> │ Parser: │ ------> │ PUSH_CONST │ ------> │ Stack: │ -│ │ │ (+ │ │ N:1 │ │ [N:1] │ -│ │ │ 1 │ │ PUSH_CONST │ │ [N:1,N:2] │ -│ │ │ 2) │ │ N:2 │ │ [N:3] │ -│ │ │ │ │ ADD │ │ │ -└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ - -Closure Flow Example: -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Input: │ │ Lexer/ │ │ Code Gen │ │ VM │ -│(let ((x 10))│ ------->│ Parser: │ ------> │ STORE x │ ------> │ Env: [x=10] │ -│ ((lambda │ │ (let │ │ LABEL λ │ │ │ -│ (y) (+ x │ │ ((x 10)) │ │ CAPTURE_ENV │ │ Closure: │ -│ y)) 32))) │ │ (lambda │ │ PUSH_CONST │ │ λ+env_id │ -│ │ │ (y) ...) │ │ CLOSURE:... │ │ │ -│ │ │ │ │ CALL │ │ Result: 42 │ -└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ - -State Management: -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Global │ │ Environment │ │ Function │ │ Closure │ -│ Variables │ ------> │ Stack │ ------> │ Calls │ ------> │ Environment │ -│ (persist) │ │ (frames) │ │ (direct) │ │ (captured) │ -└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ - -Data Type System: +``` + +## Execution Flow + +``` +Scheme Code → Compiler → VM Bytecode → Virtual Machine → Result + ↓ ↓ ↓ ↓ ↓ + "(+ 1 2)" Tokens PUSH_CONST Stack: [1] "N:3" + Tree N:1 Stack: [1,2] + PUSH_CONST Stack: [3] + N:2 + ADD + HALT +``` + +## Data Types + +``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Numbers │ │ Booleans │ │ Symbols │ │ Closures │ -│ N:value │ │ B:value │ │ S:name │ │CLOSURE:fn:env_id│ +│ N:value │ │ B:value │ │ S:name │ │CLOSURE:fn:env│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ └───────────────────┼───────────────────┼───────────────────┘ @@ -71,21 +44,73 @@ Data Type System: │ Pairs │ │ Nil │ │ P:index │ │ NIL: │ └─────────────┘ └─────────────┘ + │ │ + └───────────────────┘ + ┌─────────────┐ + │ Strings │ + │ STR:content│ + └─────────────┘ +``` -VM Instruction Set: -┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ -│ Stack Ops │ │ Environment │ │ Functions │ -│ PUSH_CONST val │ │ STORE name │ │ LABEL name │ -│ POP │ │ LOOKUP name │ │ CALL │ -│ ADD/SUB/MUL/DIV │ │ POP_ENV │ │ RETURN │ -│ GET_VALUE │ │ │ │ │ -└─────────────────┘ └─────────────────┘ └─────────────────┘ - │ │ │ - └───────────────────────┼───────────────────────┘ - │ - ┌─────────────────┐ - │ Closures │ - │ CAPTURE_ENV fn │ - │ PUSH_CONST │ - │ CLOSURE:fn:env │ - └─────────────────┘ \ No newline at end of file +## VM Instructions + +### Core Operations +- **Stack**: `PUSH_CONST`, `POP`, `ADD`, `SUB`, `MUL`, `DIV` +- **Environment**: `STORE`, `LOOKUP`, `POP_ENV` +- **Functions**: `LABEL`, `CALL`, `RETURN` +- **Closures**: `CAPTURE_ENV`, `PUSH_CONST CLOSURE:fn:env` + +### Example: Lambda Function +``` +Input: ((lambda (x) (+ x 1)) 41) + +Bytecode: +LABEL __lambda_0 +STORE x +LOOKUP x +PUSH_CONST N:1 +ADD +POP_ENV +RETURN +PUSH_CONST N:41 +CALL __lambda_0 + +Result: N:42 +``` + +## Standard Library Functions + +### Numeric +- `(+ - * / modulo expt abs min max)` +- `(zero? positive? negative?)` + +### Lists +- `(cons car cdr length append reverse member)` +- `(cadr caddr list-ref list-tail null? pair?)` + +### Strings +- `(string-length string-append string-ref substring)` +- `(string=? string<? string>?)` + +### Control Flow +- `(if cond and or not)` + +### Higher-Order +- `(map filter)` - Support all function types (built-in, lambda, user-defined, closures) + +## Forth Implementation + +The VM is language-agnostic. Forth compiler generates same bytecode: + +``` +Forth: "5 3 + ." + ↓ +Bytecode: PUSH_CONST N:5 + PUSH_CONST N:3 + ADD + PRINT + POP + HALT + ↓ +Result: N:8 +``` \ No newline at end of file diff --git a/awk/scheme/scheme/examples/cons.test.scm b/awk/scheme/scheme/examples/cons.test.scm deleted file mode 100644 index d1e3847..0000000 --- a/awk/scheme/scheme/examples/cons.test.scm +++ /dev/null @@ -1,3 +0,0 @@ -(cons (+ 1 2) - (cons (* 3 4) - nil)) diff --git a/awk/scheme/scheme/examples/define.test.scm b/awk/scheme/scheme/examples/define.test.scm deleted file mode 100644 index ec66b04..0000000 --- a/awk/scheme/scheme/examples/define.test.scm +++ /dev/null @@ -1,2 +0,0 @@ -(define add2 (x) (+ x 2)) -(add2 40) \ No newline at end of file diff --git a/awk/scheme/scheme/examples/lambda.test.scm b/awk/scheme/scheme/examples/lambda.test.scm deleted file mode 100644 index 1f2bb09..0000000 --- a/awk/scheme/scheme/examples/lambda.test.scm +++ /dev/null @@ -1,12 +0,0 @@ -; Test lambda function support -((lambda (x) (+ x 1)) 41) ; Should return 42 - -; Test lambda with multiple parameters -((lambda (x y) (+ x y)) 20 22) ; Should return 42 - -; Test lambda in let expression -(let ((add1 (lambda (x) (+ x 1)))) - (add1 41)) ; Should return 42 - -; Test nested lambda -((lambda (x) ((lambda (y) (+ x y)) 1)) 41) ; Should return 42 \ No newline at end of file diff --git a/awk/scheme/scheme/examples/let-and-define.test.scm b/awk/scheme/scheme/examples/let-and-define.test.scm deleted file mode 100644 index fade30b..0000000 --- a/awk/scheme/scheme/examples/let-and-define.test.scm +++ /dev/null @@ -1,9 +0,0 @@ -; Let expression example -(let ((x 5) (y 3)) - (+ x y)) - -; Function definition example -(define add2 (x) - (+ x 2)) - -(add2 40) ; Returns 42 \ No newline at end of file diff --git a/awk/scheme/scheme/examples/let.test.scm b/awk/scheme/scheme/examples/let.test.scm deleted file mode 100644 index 2cdc3b8..0000000 --- a/awk/scheme/scheme/examples/let.test.scm +++ /dev/null @@ -1,2 +0,0 @@ -(let ((x 5)) - (+ x 2)) diff --git a/awk/scheme/scheme/scratch/OUTLINE.md b/awk/scheme/scheme/scratch/OUTLINE.md new file mode 100644 index 0000000..073afef --- /dev/null +++ b/awk/scheme/scheme/scratch/OUTLINE.md @@ -0,0 +1,372 @@ +# Blog Post Outline: "I Built a Scheme VM in AWK and It's Surprisingly Not Terrible" + +## Introduction +- Hook: "What do you get when you cross a functional programming language with a text processing tool from the 1970s?" +- Brief context: My journey from JS/functional programming to AWK/compilers/VMs +- Thesis: Building a Scheme VM in AWK taught me more about language design than I expected, and it evolved into a surprisingly capable system + +## The Setup: Why AWK? (And Why Not?) +- The initial reaction: "AWK? Really?" +- What AWK actually is: a text processing language with some surprising features +- The appeal: simple, ubiquitous, built-in associative arrays +- The challenges: 1-indexed strings, weird syntax, limited data structures +- The realization: AWK is basically a functional language in disguise + +## Architecture Overview: The Three-Layer Cake +- **Layer 1: The Compiler** (`bin/compiler.awk`) + - Takes Scheme expressions, outputs VM bytecode + - Recursive descent parsing (because AWK doesn't have fancy parser generators) + - The joy of implementing `car` and `cdr` in a language that doesn't have lists + - Error detection for incomplete expressions (because debugging is hard enough) + +- **Layer 2: The Virtual Machine** (`bin/vm.awk`) + - Stack-based execution (because registers are hard) + - Type system with tagged values (`N:5`, `B:1`, `P:42`) + - Environment management for lexical scoping + - The weirdness of implementing closures in AWK + - Higher-order functions with nested call support (because functional programming is fun) + +- **Layer 3: The REPL** (`bin/repl`) + - Interactive development (because who doesn't love a good REPL?) + - State persistence between sessions + - The challenge of making AWK feel interactive + +## The Evolution: From Simple Interpreter to Language-Agnostic VM + +### The Early Days: Basic Scheme +- Started with simple arithmetic and basic functions +- The joy of getting `(+ 1 2)` to work +- Why stack-based execution made sense in AWK +- The challenge of implementing lists with cons cells + +### The Middle Phase: Functions and Closures +- Adding user-defined functions (because Scheme needs functions) +- Implementing closures (because lexical scoping is important) +- The complexity of environment management in AWK +- Why capturing environments was harder than expected + +### The Current State: Higher-Order Functions and Multi-Language Support +- Adding `map` and `filter` with full function support +- The nested function call system (because higher-order functions need to call other functions) +- The Forth compiler experiment (because why not?) +- Proving the VM is truly language-agnostic + +## The Weirdness: AWK Quirks That Made Me Question My Life Choices + +### 1-Indexed Everything +- Strings, arrays, everything starts at 1 +- The constant `substr(string, 1, index(string, ":") - 1)` dance +- Why `substr()` needs explicit length but `index()` returns 1-indexed positions +- The mental gymnastics of converting between 0-indexed and 1-indexed + +### Associative Arrays (The Good Part) +- AWK's killer feature: built-in hash maps +- How we use them for environments, function tables, closures +- The joy of `array[key] = value` without any setup +- The weirdness of `for (key in array)` syntax + +### String Manipulation: A Love-Hate Relationship +- `gsub()` for global substitution (like `replaceAll()`) +- `sub()` for single substitution +- The regex syntax that looks like it's from another planet +- Why string concatenation is just space-separation + +### File I/O: The AWK Way +- `getline` for reading files (returns 1/0/-1, not the data) +- `printf` with `>` for writing files +- The weirdness of `close()` being explicit +- State persistence using simple text files + +## The Compiler: From Scheme to Bytecode + +### Lexical Analysis: Tokenizing in AWK +- How we split Scheme expressions into tokens +- Handling parentheses, strings, numbers, symbols +- The challenge of nested expressions +- Why we need to track string literals to avoid paren confusion + +### Parsing: Recursive Descent in AWK +- Implementing a parser without parser generators +- The beauty of recursive functions in AWK +- How we handle function calls, special forms, literals +- The joy of implementing `if` and `cond` without proper control flow +- Error detection for incomplete expressions (because users make mistakes) + +### Code Generation: From AST to Bytecode +- The instruction set: `PUSH_CONST`, `ADD`, `CALL`, `HALT` +- How we generate stack-based code +- The challenge of variable scoping +- Why function calls are just `CALL` instructions + +## The Virtual Machine: Stack-Based Execution + +### The Stack: Our Best Friend +- Simple stack operations: `push()`, `pop()`, `peek()` +- How everything becomes stack manipulation +- The beauty of stack-based execution +- Why we don't need registers (much) + +### Type System: Tagged Values +- Every value is `type:value` (e.g., `N:5`, `B:1`) +- Runtime type checking with `getType()` and `getValue()` +- The challenge of implementing lists with cons cells +- Why we need a heap for complex data structures + +### Environment Management: Lexical Scoping in AWK +- How we implement variable bindings +- Environment frames for function calls +- The complexity of closures +- Why capturing environments is harder than it looks + +### Function Calls: The Magic +- Built-in functions vs user-defined functions +- How we handle function lookup and execution +- The challenge of parameter passing +- Why `CALL` is more complex than it seems + +### Higher-Order Functions: The Real Magic +- How `map` and `filter` work with any function type +- The nested function call system +- Why calling functions from within functions is tricky +- The joy of data transformation pipelines + +## The Forth Experiment: Proving Language Agnosticism + +### The Original Approach: Forth → Scheme → VM +- Translating Forth to Scheme expressions +- Using the existing compiler pipeline +- Why this felt like cheating + +### The Real Deal: Forth → VM Directly +- Building a Forth compiler that generates bytecode +- The joy of `PUSH_CONST N:5` and `ADD` +- Proving the VM is truly language-agnostic +- Why this is the real achievement + +### The Insight: Universal VM Target +- Any language can target this VM +- The same bytecode from different sources +- Why this matters for language design +- The power of a well-designed instruction set + +## The Testing Journey: From Chaos to Order + +### The Early Days: Manual Testing +- Running expressions by hand +- The pain of debugging without proper tests +- Why testing is crucial even for small projects + +### The Evolution: Comprehensive Test Suite +- 48 unit tests covering all features +- Integration tests for complex scenarios +- Regression tests to prevent bugs +- Examples that demonstrate usage + +### The Test Runner: Making Testing Easy +- Automated test execution +- Clear pass/fail reporting +- Error detection and reporting +- Why good tooling matters + +## The REPL: Making AWK Interactive + +### State Persistence: Between Sessions +- Saving function definitions to files +- Loading state on startup +- The simplicity of text-based persistence +- Why we don't need a database + +### Interactive Development: The AWK Way +- Reading input line by line +- Compiling and executing each expression +- The challenge of maintaining state +- Why the REPL feels surprisingly natural + +## Lessons Learned: What AWK Taught Me About Language Design + +### Simplicity is Powerful +- AWK's simple model enables complex things +- How constraints lead to creativity +- Why less is often more in language design + +### Stack-Based Execution is Beautiful +- The elegance of stack operations +- How it simplifies the VM +- Why many real VMs use this approach + +### Type Systems Matter +- Runtime type checking prevents bugs +- Tagged values enable safe operations +- Why type safety is worth the overhead + +### Environment Management is Hard +- Lexical scoping requires careful implementation +- Closures add significant complexity +- Why proper scoping is crucial for functional languages + +### Higher-Order Functions are Worth It +- The power of function composition +- How data transformation becomes elegant +- Why functional programming patterns matter + +### Language Agnosticism is Achievable +- A well-designed VM can support multiple languages +- The benefits of shared runtime and standard library +- Why this approach reduces implementation complexity + +## The Weirdest Parts: AWK Quirks That Made Me Laugh/Cry + +### The 1-Indexed String Functions +- `substr(string, 1, length)` vs `string.slice(0, length)` +- The constant mental conversion +- Why this exists and how to cope + +### Associative Array Syntax +- `array[key] = value` without declaration +- The joy of `for (key in array)` +- Why this is AWK's best feature + +### File I/O Patterns +- `getline` returning status codes +- The explicit `close()` requirement +- Why AWK's I/O model is actually quite elegant + +### String Manipulation +- `gsub()` and `sub()` for regex replacement +- The regex syntax that looks alien +- Why AWK's string handling is powerful but weird + +## Performance and Limitations: The Reality Check + +### What Works Well +- Simple expressions compile and run quickly +- The REPL is responsive +- State persistence works reliably +- The type system prevents many bugs +- Higher-order functions work smoothly +- The test suite catches regressions + +### What's Slow/Weird +- Complex expressions can be slow +- Memory usage with large programs +- The 1-indexed string operations +- Limited data structures beyond arrays +- No tail call optimization + +### The AWK Tax +- Everything is string-based under the hood +- No native data structures beyond arrays +- Limited control flow constructs +- The constant string manipulation overhead + +## The Current State: A Surprisingly Capable System + +### What We Have +- Full Scheme interpreter with higher-order functions +- Language-agnostic VM with Forth support +- Comprehensive test suite (48 tests) +- Interactive REPL with state persistence +- Error detection and reporting +- Standard library with 50+ functions + +### What Makes It Special +- Built entirely in AWK (no external dependencies) +- Supports complex functional programming patterns +- Language-agnostic VM design +- Comprehensive testing and documentation +- Clean, maintainable architecture + +### The Numbers +- ~3,900 lines of AWK code +- 48 unit tests + integration/regression tests +- 50+ standard library functions +- 2 target languages (Scheme, Forth) +- 0 external dependencies + +## Conclusion: Was It Worth It? + +### The Unexpected Benefits +- Deep understanding of VM architecture +- Appreciation for language design decisions +- The joy of building something from scratch +- Why constraints breed creativity +- The power of comprehensive testing + +### The AWK Revelation +- AWK is more powerful than it looks +- Functional programming concepts work everywhere +- Simple tools can build complex systems +- Why the Unix philosophy matters + +### The Bigger Picture +- How this relates to real VMs (JVM, .NET, etc.) +- The universality of stack-based execution +- Why language-agnostic VMs are powerful +- The future of language design + +### Final Thoughts +- "I built a Scheme VM in AWK and it's surprisingly not terrible" +- The value of building things in unexpected languages +- Why understanding the fundamentals matters +- The joy of programming for its own sake +- How constraints can lead to better design + +## Technical Appendix: Key Code Snippets + +### The VM's Main Loop +```awk +while (pc < length(program)) { + execute(program[pc++]) +} +``` + +### Type System Implementation +```awk +function makeValue(type, val) { + return type ":" val +} +``` + +### Stack Operations +```awk +function push(val) { + stack[++stack_ptr] = val +} +``` + +### Higher-Order Functions +```awk +function stdlib_map() { + # Get function and list from stack + func = pop() + list = pop() + # Call function for each element + result = execute_nested_function_call(func, element) +} +``` + +### The Forth Compiler +```awk +if (token ~ /^-?[0-9]+$/) { + bytecode = bytecode "PUSH_CONST N:" token "\n" +} else if (token == "+") { + bytecode = bytecode "ADD\n" +} +``` + +### Error Detection +```awk +if (paren_count > 0) { + error("Unmatched opening parentheses - incomplete expression") +} +``` + +## Resources and Further Reading +- AWK documentation and tutorials +- Scheme language specification +- Virtual machine design principles +- Stack-based execution models +- Language implementation techniques +- Functional programming concepts +- Higher-order functions and closures +- Language-agnostic VM design \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/arch-notes.md b/awk/scheme/scheme/scratch/arch-notes.md index 060ca72..bb2e240 100644 --- a/awk/scheme/scheme/scratch/arch-notes.md +++ b/awk/scheme/scheme/scratch/arch-notes.md @@ -1,7 +1,7 @@ # Awk-Scheme: Architectural Notes ## Overview -Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a compiler and a stack-based virtual machine (VM). The architecture is modular, with clear separation of concerns between parsing/compilation and execution. The design leverages several classic architectural patterns to achieve extensibility, maintainability, and clarity. +A little Scheme interpreter implemented in AWK, composed of a compiler and a stack-based virtual machine. The architecture seeks to be modular, with clear separation of concerns between parsing/compilation and execution. The system has evolved to support higher-order functions, nested function calls, and language-agnostic VM design with Forth as a second target language. --- @@ -10,6 +10,7 @@ Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a com 1. **Input**: User provides Scheme code (via REPL or file). 2. **Compilation**: The compiler (`bin/compiler.awk`) parses and compiles Scheme code into VM instructions. 3. **Execution**: The VM (`bin/vm.awk`) executes the instructions, managing state, environment, and memory. +4. **Output**: Results are displayed via the `display` function or returned as typed values. --- @@ -18,22 +19,32 @@ Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a com ### 2.1. Lexical Analysis (Lexer) - **Pattern**: *Lexer/Parser Separation* (classic compiler front-end) - **Why**: Decouples tokenization from parsing, making the code easier to reason about and extend. -- **How**: The compiler tokenizes input into numbers, symbols, and parentheses, handling whitespace and comments. +- **How**: The compiler tokenizes input into numbers, symbols, strings, and parentheses, handling whitespace and comments with proper string literal recognition. ### 2.2. Parsing (Recursive Descent) - **Pattern**: *Recursive Descent Parser* - **Why**: Simple, direct mapping from grammar to code; easy to debug and extend for a small language. -- **How**: The parser builds an expression tree from tokens, handling nested expressions and validating syntax. +- **How**: The parser builds an expression tree from tokens, handling nested expressions, string literals, and validating syntax with proper parenthesis matching and error detection for incomplete expressions. ### 2.3. Code Generation - **Pattern**: *Visitor/Dispatcher* (for expression types) -- **Why**: Each expression type (number, variable, list, special form) is handled by a dedicated function, improving maintainability. -- **How**: The compiler emits stack-based VM instructions for each expression, handling special forms (define, let, lambda) and function calls. +- **Why**: Each expression type (number, variable, list, special form, string) is handled by a dedicated function, improving maintainability. +- **How**: The compiler emits stack-based VM instructions for each expression, handling special forms (define, let, lambda) and function calls with proper argument validation. -### 2.4. Closure Support +### 2.4. Primitive Function Handling +- **Pattern**: *Strategy Pattern* (for different function types) +- **Why**: Different functions require different compilation strategies - some are built-in primitives, others are standard library functions, and some are user-defined. +- **How**: The compiler uses explicit dispatch in `compile_primitive_call()` to handle arithmetic operations, list operations, string operations, and standard library functions with appropriate argument validation. + +### 2.5. Closure Support - **Pattern**: *Closure Conversion* (from functional programming) - **Why**: Enables lexical scoping and first-class functions by capturing the environment at lambda creation. -- **How**: The compiler emits instructions to capture the current environment and create closure objects. +- **How**: The compiler emits `CAPTURE_ENV` instructions to capture the current environment and create closure objects with unique environment IDs. + +### 2.6. Error Detection +- **Pattern**: *Fail-Fast Error Detection* +- **Why**: Early detection of syntax errors prevents runtime issues and improves debugging. +- **How**: The compiler validates parenthesis matching and detects incomplete expressions during parsing, providing clear error messages. --- @@ -42,74 +53,229 @@ Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a com ### 3.1. Stack-Based Execution - **Pattern**: *Stack Machine* (classic VM design) - **Why**: Simplicity and direct mapping from compiled instructions to execution; easy to implement in AWK. -- **How**: The VM maintains an evaluation stack for operands and results, executing instructions sequentially. +- **How**: The VM maintains an evaluation stack for operands and results, executing instructions sequentially with proper stack management and cleanup. ### 3.2. Typed Value System - **Pattern**: *Tagged Union* (Algebraic Data Type) - **Why**: Enables runtime type checking and safe operations on heterogeneous values. -- **How**: All values are tagged (e.g., `N:`, `B:`, `P:`, `F:`, `CLOSURE:`) and checked at runtime. +- **How**: All values are tagged (e.g., `N:`, `B:`, `P:`, `F:`, `CLOSURE:`, `STR:`) and checked at runtime with type predicates. ### 3.3. Environment Model - **Pattern**: *Environment Chain* (Lexical Scoping) -- **Why**: Supports variable bindings, lexical scope, and closures. -- **How**: The VM maintains an environment stack for variable bindings, with special handling for global and closure environments. +- **Why**: Supports variable bindings, lexical scope, and closures with proper variable lookup. +- **How**: The VM maintains an environment stack for variable bindings, with special handling for global and closure environments, including environment restoration for closure calls. ### 3.4. Function and Closure Handling - **Pattern**: *Direct Function Execution* (no program array modification) - **Why**: Simplifies call/return logic and avoids mutation of the instruction stream. -- **How**: Functions are stored as code blocks; calls execute code directly, with environment management for parameters and closures. +- **How**: Functions are stored as code blocks; calls execute code directly, with environment management for parameters and closures, including proper environment capture and restoration. + +### 3.5. Function Dispatch System +- **Pattern**: *Registry Pattern* (for function lookup) +- **Why**: Enables efficient function dispatch and supports function aliasing for user convenience. +- **How**: The VM maintains a `FUNCTIONS` registry mapping function names to implementations, allowing multiple names to point to the same function (e.g., `inc` and `++`). + +### 3.6. Variable Argument Functions +- **Pattern**: *Variadic Function* (for flexible argument handling) +- **Why**: Enables functions like `string-append` to accept any number of arguments, improving usability. +- **How**: Functions like `string_append()` process all arguments on the stack using a loop, maintaining proper argument order and type checking. + +### 3.7. Higher-Order Functions +- **Pattern**: *Function as Data* (first-class functions) +- **Why**: Enables functional programming patterns like `map` and `filter` with user-defined functions, lambdas, and closures. +- **How**: The VM implements `map` and `filter` functions that can accept any function type (built-in, user-defined, lambda, closure) through a nested function call mechanism. + +### 3.8. Nested Function Call System +- **Pattern**: *Context Preservation* (for nested execution) +- **Why**: Enables higher-order functions to call other functions while preserving the calling context. +- **How**: The VM maintains a call stack that saves and restores execution context, allowing functions to be called from within other functions without losing state. -### 3.5. Heap and Memory Management +### 3.9. Output System +- **Pattern**: *Visitor Pattern* (for value display) +- **Why**: Different value types require different display formats, and the display logic should be extensible. +- **How**: The `display_value()` function recursively visits different value types, converting them to readable string representations with proper list formatting. + +### 3.10. Heap and Memory Management - **Pattern**: *Manual Heap with Reference Counting (partial)* -- **Why**: Enables cons cell allocation and basic memory management. +- **Why**: Enables cons cell allocation and basic memory management for list operations. - **How**: The VM allocates cons cells on a heap array, with a placeholder for reference counting (not fully implemented). -### 3.6. State Persistence +### 3.11. State Persistence - **Pattern**: *State Serialization* -- **Why**: Allows global state and functions to persist across REPL sessions. -- **How**: The VM serializes global variables and function definitions to files, loading them on startup. +- **Why**: Allows global state and functions to persist across REPL sessions for continuity. +- **How**: The VM serializes global variables and function definitions to files, loading them on startup with proper state restoration. + +--- + +## 4. Standard Library Architecture + +### 4.1. Function Registration +- **Pattern**: *Registry Pattern* (for extensible function library) +- **Why**: Enables easy addition of new standard library functions without modifying core VM logic. +- **How**: Functions are registered in the `FUNCTIONS` table during VM initialization, with dispatch logic in `vm_call_function()`. + +### 4.2. Function Aliasing +- **Pattern**: *Alias Pattern* (for user convenience) +- **Why**: Allows multiple names for the same function, improving usability and providing familiar syntax. +- **How**: Multiple entries in the `FUNCTIONS` table point to the same implementation (e.g., `inc` and `++` both point to `add_one`). + +### 4.3. String Operations +- **Pattern**: *Builder Pattern* (for string construction) +- **Why**: String operations like concatenation need to handle variable numbers of arguments efficiently. +- **How**: Functions like `string_append()` build the result incrementally by processing all arguments on the stack in order. + +### 4.4. Higher-Order Functions +- **Pattern**: *Functional Programming Patterns* +- **Why**: Enables data transformation and filtering with user-defined functions. +- **How**: `map` and `filter` functions accept any function type and execute them in a nested context, supporting complex data processing pipelines. + +--- + +## 5. Language-Agnostic VM Design + +### 5.1. Forth Compiler +- **Pattern**: *Multi-Language Frontend* +- **Why**: Demonstrates the VM's language-agnostic nature and validates the instruction set design. +- **How**: A separate Forth compiler (`scratch/forth/forth.awk`) generates the same VM bytecode, proving that the VM can execute code from different source languages. + +### 5.2. Shared Instruction Set +- **Pattern**: *Universal Bytecode* +- **Why**: Enables multiple languages to target the same VM, reducing implementation complexity. +- **How**: Both Scheme and Forth compilers generate identical VM instructions, allowing seamless execution of mixed-language programs. + +### 5.3. Standard Library Integration +- **Pattern**: *Shared Runtime* +- **Why**: Provides consistent functionality across different source languages. +- **How**: The VM's standard library functions are available to all languages targeting the VM, ensuring consistent behavior. + +--- + +## 6. Testing Architecture + +### 6.1. Comprehensive Test Suite +- **Pattern**: *Layered Testing Strategy* +- **Why**: Ensures correctness across different levels of complexity and use cases. +- **How**: Tests are organized into unit tests (48 tests), integration tests, regression tests, and examples, covering basic operations to complex higher-order function scenarios. + +### 6.2. Test Organization +- **Pattern**: *Test Categories* +- **Why**: Different types of tests serve different purposes in validation. +- **How**: Unit tests focus on individual features, integration tests validate complex interactions, regression tests prevent regressions, and examples demonstrate usage patterns. + +### 6.3. Error Detection Testing +- **Pattern**: *Negative Testing* +- **Why**: Ensures the system properly handles and reports errors. +- **How**: Tests include syntax error detection, type checking, and edge cases to validate error handling behavior. --- -## 4. Extensibility and Maintainability -- **Pattern**: *Separation of Concerns* +## 7. Extensibility and Maintainability + +### 7.1. Separation of Concerns +- **Pattern**: *Layered Architecture* - **Why**: Compiler and VM are independent, making it easy to extend the language or change the execution model. -- **Pattern**: *Table-Driven Dispatch* (for built-in functions) -- **Why**: Adding new primitives or special forms is straightforward. +- **How**: Clear interfaces between compilation and execution phases, with well-defined instruction formats. + +### 7.2. Function Extension +- **Pattern**: *Open/Closed Principle* (open for extension, closed for modification) +- **Why**: Adding new functions should not require changes to existing code. +- **How**: New functions are added by registering them in the `FUNCTIONS` table and implementing the corresponding VM function. + +### 7.3. Error Handling +- **Pattern**: *Fail-Fast* (for debugging) +- **Why**: Early error detection helps identify issues quickly during development. +- **How**: Functions validate arguments and types at runtime, throwing descriptive error messages. + +### 7.4. Language Extension +- **Pattern**: *Plugin Architecture* (for new languages) +- **Why**: Enables new languages to target the VM without modifying existing code. +- **How**: New language compilers can be added as separate modules that generate the same VM bytecode. --- -## 5. Notable Limitations -- No support for nested lambdas (yet), proper tail recursion, or garbage collection. -- Reference counting is stubbed but not enforced. -- Error handling is minimal. +## 8. Notable Limitations and Future Enhancements + +### 8.1. Current Limitations +- **Tail Recursion**: No proper tail call optimization +- **Garbage Collection**: Reference counting is stubbed but not enforced +- **Error Recovery**: Minimal error recovery in REPL +- **Type System**: Basic runtime type checking only +- **Performance**: Limited optimization for complex expressions + +### 8.2. Architectural Patterns for Future Enhancements +- **Continuation-Passing Style**: For proper tail recursion +- **Generational Garbage Collection**: For memory management +- **Exception Handling**: For better error recovery +- **Type Inference**: For compile-time type checking +- **JIT Compilation**: For performance optimization --- -## 6. Summary Table: Patterns Used +## 9. Summary Table: Patterns Used | Area | Pattern(s) Used | Why? | |---------------------|----------------------------------|--------------------------------------| | Lexing/Parsing | Lexer/Parser, Recursive Descent | Simplicity, extensibility | -| Code Generation | Visitor/Dispatcher | Maintainability, clarity | +| Code Generation | Visitor/Dispatcher, Strategy | Maintainability, clarity | | VM Execution | Stack Machine, Tagged Union | Simplicity, type safety | | Environment | Environment Chain, Closure Conv. | Lexical scoping, closures | -| Function Dispatch | Table-Driven Dispatch | Extensibility | +| Function Dispatch | Registry Pattern, Alias Pattern | Extensibility, usability | +| String Operations | Builder Pattern, Variadic Func. | Flexibility, user experience | +| Output System | Visitor Pattern | Extensibility, type-specific display | | State Persistence | State Serialization | REPL continuity | | Memory Management | Manual Heap, Ref Counting (stub) | List support, future GC | +| Error Handling | Fail-Fast | Debugging, development | +| Higher-Order Funcs | Function as Data, Context Pres. | Functional programming support | +| Multi-Language | Universal Bytecode, Shared Runtime| Language agnosticism | +| Testing | Layered Testing, Test Categories | Comprehensive validation | --- -## 7. Architectural Choices: Rationale +## 10. Architectural Choices: Rationale + +### 10.1. Language Choice - **AWK as Implementation Language**: Chosen for portability and as a challenge; influences the use of arrays and string-based data structures. +- **Why**: AWK's built-in associative arrays and string manipulation capabilities map well to the VM's needs. + +### 10.2. VM Design - **Stack Machine**: Maps well to AWK's capabilities and keeps the VM simple. +- **Why**: Stack operations are straightforward to implement in AWK and provide clear semantics. + +### 10.3. Modularity - **Separation of Compiler/VM**: Enables clear boundaries and easier debugging. +- **Why**: Independent development and testing of compilation and execution phases. + +### 10.4. Type System - **Explicit Typing**: Reduces runtime errors and clarifies value handling. +- **Why**: Tagged values provide runtime safety and clear debugging information. + +### 10.5. Language Agnosticism +- **Universal VM Target**: Enables multiple languages to share the same runtime. +- **Why**: Reduces implementation complexity and enables language experimentation. --- -## 8. Flow Diagram (Textual) +## 11. Flow Diagram (Textual) ``` -User Input (Scheme) → [Compiler] → VM Instructions → [VM] → Result/State -``` \ No newline at end of file +User Input (Scheme/Forth) → [Compiler] → VM Instructions → [VM] → Result/State + ↓ + [Display/Output] +``` + +## 12. Key Architectural Insights + +### 12.1. Pattern Composition +The system demonstrates how classic architectural patterns can be composed to create a functional interpreter. The combination of Registry, Visitor, and Strategy patterns enables extensibility while maintaining simplicity. + +### 12.2. AWK-Specific Adaptations +The patterns are adapted to AWK's strengths: associative arrays for registries, string manipulation for value representation, and procedural programming for the VM loop. + +### 12.3. Extensibility Through Registration +The function registration system demonstrates how a simple registry pattern can enable significant extensibility without complex inheritance hierarchies or plugin systems. + +### 12.4. Language Agnosticism +The VM's language-agnostic design demonstrates how a well-designed instruction set can support multiple source languages, enabling experimentation and reducing implementation complexity. + +### 12.5. Higher-Order Function Support +The nested function call system shows how complex functional programming features can be implemented in a simple stack-based VM, enabling powerful data transformation capabilities. \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/complex_test.scm.asm b/awk/scheme/scheme/scratch/complex_test.scm.asm deleted file mode 100644 index 67870c3..0000000 --- a/awk/scheme/scheme/scratch/complex_test.scm.asm +++ /dev/null @@ -1,44 +0,0 @@ -# Test proper list construction (3 2 1) -# Building the list in proper order: car points to value, cdr points to next pair - -# Start with empty list -PUSH_CONST NIL: # [nil] -PRINT # Print nil - -# Build (1 . nil) -PUSH_CONST NIL: # [nil] -PUSH_CONST N:1 # [nil 1] -SWAP # [1 nil] -CONS # [(1 . nil)] -DUP -PRINT # Print (1 . nil) - -# Build (2 . (1 . nil)) -PUSH_CONST N:2 # [(1.nil) 2] -SWAP # [2 (1.nil)] -CONS # [(2 . (1.nil))] -DUP -PRINT # Print (2 . (1.nil)) - -# Build (3 . (2 . (1 . nil))) -PUSH_CONST N:3 # [(2.(1.nil)) 3] -SWAP # [3 (2.(1.nil))] -CONS # [(3 . (2.(1.nil)))] -DUP -PRINT # Print full structure - -# Test CAR/CDR operations -DUP # Keep a copy of the list for later -DUP # Another copy for CAR -CAR # Get first element (3) -PRINT # Should print 3 - -SWAP # Bring back our spare list copy -CDR # Get rest of list ((2 . (1 . nil))) -DUP -PRINT # Print rest of list - -CAR # Get first of rest (2) -PRINT # Should print 2 - -HALT \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/forth/README.md b/awk/scheme/scheme/scratch/forth/README.md new file mode 100644 index 0000000..db1d1fe --- /dev/null +++ b/awk/scheme/scheme/scratch/forth/README.md @@ -0,0 +1,139 @@ +# FORTH Compiler for Awk-Scheme VM + +## What This Is + +This is a **Forth compiler** that generates bytecode for the Awk-Scheme VM directly, without any modifications to the VM itself. + +## What It Demonstrates + +This proves that your VM is **truly language-agnostic**. The VM doesn't know or care whether the bytecode came from: + +- The Scheme compiler +- This Forth compiler +- Any other language compiler + +## Key Insight + +Both approaches generate the **exact same bytecode**: + +``` +Forth: 5 3 + + ↓ (direct compilation) +VM: PUSH_CONST N:5 + PUSH_CONST N:3 + ADD + HALT +``` + +The VM executes this bytecode identically regardless of the source language! + +## Architecture Benefits + +1. **No VM modifications required** - The VM remains unchanged +2. **Language independence** - Any language can target this VM +3. **Shared runtime** - All languages benefit from the same VM optimizations +4. **Consistent semantics** - All languages get the same type system and memory management +5. **Easy to extend** - Adding new languages just requires new compilers + +## How It Works + +The `forth_compiler` program: + +1. **Parses Forth expressions** into tokens +2. **Generates VM bytecode directly** for each token +3. **Executes the bytecode** using the existing VM +4. **Displays results** from the VM + +## Supported Forth Words + +### Basic Arithmetic +- `+` - Addition +- `-` - Subtraction +- `*` - Multiplication +- `/` - Division + +### Advanced Math (using VM standard library) +- `abs` - Absolute value +- `mod` - Modulo operation +- `min` - Minimum of two numbers +- `max` - Maximum of two numbers + +### Type Predicates (using VM standard library) +- `zero?` - Check if top of stack is zero +- `positive?` - Check if top of stack is positive +- `negative?` - Check if top of stack is negative +- `number?` - Check if top of stack is a number + +### Stack Operations +- `.` - Print top of stack +- `cr` - Print newline +- `dup` - Duplicate top of stack +- `swap` - Swap top two elements +- `drop` - Drop top of stack +- `2drop` - Drop top two elements + +### List Operations +- `cons` - Create cons cell +- `car` - Get car of cons cell +- `cdr` - Get cdr of cons cell + +### Logic Operations +- `=` - Equality comparison +- `<` - Less than comparison +- `>` - Greater than comparison +- `not` - Logical NOT + +### System +- `words` - List all available words +- `bye` - Exit the Forth interpreter + +## Testing + +```bash +# Basic arithmetic +echo "5 3 +" | ./scratch/forth/forth.awk + +# Advanced math using VM standard library +echo "-5 abs ." | ./scratch/forth/forth.awk +echo "17 5 mod ." | ./scratch/forth/forth.awk +echo "10 3 min ." | ./scratch/forth/forth.awk + +# Type predicates +echo "0 zero? ." | ./scratch/forth/forth.awk +echo "5 positive? ." | ./scratch/forth/forth.awk +echo "-3 negative? ." | ./scratch/forth/forth.awk + +# Stack manipulation +echo "5 dup * ." | ./scratch/forth/forth.awk +echo "10 20 swap . ." | ./scratch/forth/forth.awk + +# List operations +echo "1 2 cons car ." | ./scratch/forth/forth.awk +``` + +## What This Proves + +This implementation demonstrates several important architectural principles: + +### 1. Language Agnosticism +The VM is completely indifferent to the source language. It only cares about the bytecode format. + +### 2. Compiler Independence +Different languages can have completely different compilers that all target the same VM. + +### 3. Shared Runtime Benefits +All languages get the same: +- Type system +- Memory management +- Error handling +- Performance characteristics +- Debugging capabilities + +### 4. Extensibility +Adding a new language is as simple as writing a new compiler that generates the right bytecode. + +## Conclusion + +This proves your VM design is successful as a **universal target** for multiple languages. The VM is the common denominator - a pure execution engine that can run bytecode from any language that targets its instruction set. + +This is exactly the kind of flexibility and language-agnostic design that makes a VM architecture powerful! \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/forth/forth.awk b/awk/scheme/scheme/scratch/forth/forth.awk new file mode 100755 index 0000000..618f4d5 --- /dev/null +++ b/awk/scheme/scheme/scratch/forth/forth.awk @@ -0,0 +1,249 @@ +#!/usr/bin/awk -f + +# Forth-to-VM Compiler for VM Validation +# +# This compiler translates Forth expressions to VM bytecode, validating +# the VM implementation by testing individual operations. +# +# Architecture: +# - Forth-to-VM compiler that generates VM instructions +# - Uses existing VM to validate instruction execution +# - Tests individual operations (not a true REPL with persistent stack) +# - Stack-based operations that validate VM behavior +# +# Note: Each line is executed in a separate VM instance, so stack state +# does not persist between lines. This is a limitation of the current VM +# design that doesn't impact the scheme implementation, I don't think. + +BEGIN { + print "Forth VM Compiler (for VM validation)" + print "Type 'bye' to exit, 'words' to see available words" + print "Note: Each line runs in a separate VM instance" + print "forth> " +} + +# Main input processing loop +{ + if ($0 == "bye") { + printf("Goodbye!\n") + exit + } else { + # Compile Forth to VM bytecode and execute + compile_and_execute($0) + printf("forth> ") + } +} + +# Compile Forth expression to VM bytecode and execute +function compile_and_execute(line, tokens, i, token, bytecode) { + split(line, tokens, " ") + bytecode = "" + + for (i = 1; i <= length(tokens); i++) { + token = tokens[i] + if (token == "") continue + + if (token ~ /^-?[0-9]+$/) { + # Number - push constant + bytecode = bytecode "PUSH_CONST N:" token "\n" + } else if (token == "+") { + # Addition + bytecode = bytecode "ADD\n" + } else if (token == "-") { + # Subtraction + bytecode = bytecode "SUB\n" + } else if (token == "*") { + # Multiplication + bytecode = bytecode "MUL\n" + } else if (token == "/") { + # Division + bytecode = bytecode "DIV\n" + } else if (token == ".") { + # Print top of stack and consume it + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + } else if (token == "cr") { + # Print newline + bytecode = bytecode "PUSH_CONST STR:\\n\n" + bytecode = bytecode "PRINT\n" + } else if (token == "dup") { + # Duplicate top of stack + bytecode = bytecode "DUP\n" + } else if (token == "swap") { + # Swap top two stack elements + bytecode = bytecode "SWAP\n" + } else if (token == "drop") { + # Drop top of stack + bytecode = bytecode "POP\n" + } else if (token == ".s") { + # Show stack contents (non-destructive) + bytecode = bytecode "PUSH_CONST STR:Stack: \n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + # Note: Full stack inspection would require VM support + } else if (token == "depth") { + # Show stack depth + bytecode = bytecode "PUSH_CONST STR:Depth: \n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + # Note: Stack depth would require VM support + } else if (token == "=") { + # Equality comparison + bytecode = bytecode "EQ\n" + } else if (token == "<") { + # Less than comparison + bytecode = bytecode "LT\n" + } else if (token == ">") { + # Greater than (implement as swap and less than) + bytecode = bytecode "SWAP\n" + bytecode = bytecode "LT\n" + } else if (token == "not") { + # Logical NOT + bytecode = bytecode "NOT\n" + } else if (token == "abs") { + # Absolute value + bytecode = bytecode "PUSH_CONST S:abs\n" + bytecode = bytecode "LOOKUP abs\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "mod") { + # Modulo operation + bytecode = bytecode "PUSH_CONST S:modulo\n" + bytecode = bytecode "LOOKUP modulo\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "min") { + # Minimum of two numbers + bytecode = bytecode "PUSH_CONST S:min\n" + bytecode = bytecode "LOOKUP min\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "max") { + # Maximum of two numbers + bytecode = bytecode "PUSH_CONST S:max\n" + bytecode = bytecode "LOOKUP max\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "zero?") { + # Check if top of stack is zero + bytecode = bytecode "PUSH_CONST S:zero?\n" + bytecode = bytecode "LOOKUP zero?\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "positive?") { + # Check if top of stack is positive + bytecode = bytecode "PUSH_CONST S:positive?\n" + bytecode = bytecode "LOOKUP positive?\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "negative?") { + # Check if top of stack is negative + bytecode = bytecode "PUSH_CONST S:negative?\n" + bytecode = bytecode "LOOKUP negative?\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "number?") { + # Check if top of stack is a number + bytecode = bytecode "PUSH_CONST S:number?\n" + bytecode = bytecode "LOOKUP number?\n" + bytecode = bytecode "GET_VALUE\n" + bytecode = bytecode "CALL\n" + } else if (token == "cons") { + # Create cons cell + bytecode = bytecode "CONS\n" + } else if (token == "car") { + # Get car of cons cell + bytecode = bytecode "CAR\n" + } else if (token == "cdr") { + # Get cdr of cons cell + bytecode = bytecode "CDR\n" + } else if (token == "over") { + # Copy second item to top ( a b -- a b a ) + # This is complex with basic operations, skip for now + bytecode = bytecode "DUP\n" + } else if (token == "2drop") { + # Drop top two items ( a b -- ) + bytecode = bytecode "POP\n" + bytecode = bytecode "POP\n" + } else if (token == "words") { + # List all available words + bytecode = bytecode "PUSH_CONST STR:Available words: + - * / . cr dup swap drop\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Stack: .s depth 2drop\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Logic: = < > not\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Lists: cons car cdr\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Math: abs mod min max\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Predicates: zero? positive? negative? number?\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Env: store lookup env\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Control: if then\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + bytecode = bytecode "PUSH_CONST STR:Debug: bye\n" + bytecode = bytecode "PRINT\n" + bytecode = bytecode "POP\n" + } + } + + # Add HALT instruction only if we haven't already printed something + # This prevents double output + if (bytecode !~ /PRINT/) { + bytecode = bytecode "HALT\n" + } + + # Execute the bytecode + execute_bytecode(bytecode) +} + +# Execute VM bytecode +function execute_bytecode(bytecode) { + # Write bytecode to temporary file + temp_file = "/tmp/forth_bytecode.tmp" + printf("%s", bytecode) > temp_file + close(temp_file) + + # Try different VM paths based on current directory + vm_path = "bin/vm.awk" + cmd = "awk -v PERSIST=1 -f " vm_path " < " temp_file " 2>/dev/null" + + # Read all output lines + output = "" + while ((cmd | getline line) > 0) { + if (output != "") output = output "\n" + output = output line + } + close(cmd) + + # If that failed, try the relative path from forth directory + if (output == "" || output ~ /No such file/) { + vm_path = "../../bin/vm.awk" + cmd = "awk -v PERSIST=1 -f " vm_path " < " temp_file " 2>/dev/null" + + # Read all output lines + output = "" + while ((cmd | getline line) > 0) { + if (output != "") output = output "\n" + output = output line + } + close(cmd) + } + + # Clean up + system("rm -f " temp_file) + + if (output != "") { + printf("Result: %s\n", output) + } +} \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/ideas-for-string-support.md b/awk/scheme/scheme/scratch/ideas-for-string-support.md deleted file mode 100644 index 0d81e5f..0000000 --- a/awk/scheme/scheme/scratch/ideas-for-string-support.md +++ /dev/null @@ -1,556 +0,0 @@ -# String Support Implementation Plan for Awk-Scheme - -## Overview -This document outlines a comprehensive plan for adding string support to the Awk-Scheme interpreter. The implementation will follow the existing architecture patterns and maintain consistency with the current type system and execution model. - -## Current Architecture Analysis - -### Type System -- Current types: `N:`, `B:`, `S:`, `P:`, `F:`, `NIL:`, `CLOSURE:` -- All values use `type:value` format for runtime type checking -- Type checking via `getType()` and `getValue()` functions -- Type predicates: `isNumber()`, `isBoolean()`, `isSymbol()`, etc. - -### Compiler Pipeline -1. **Lexer** (`next_token()`): Handles numbers, symbols, parentheses -2. **Parser** (`parse_expr()`, `parse_list()`): Builds expression trees -3. **Code Generator**: Emits VM instructions for different expression types -4. **Expression Compiler** (`compile_expr()`): Dispatches based on expression type - -### VM Architecture -- Stack-based execution with typed values -- Environment-based variable binding -- Built-in function registry (`FUNCTIONS` table) -- Direct function execution (no program array modification) -- State persistence for globals and functions - -## String Implementation Plan - -### Phase 1: Core String Type System - -#### 1.1 VM Type System Extensions -**File: `bin/vm.awk`** - -```awk -# Add to BEGIN block -T_STRING = "STR" # String type tag - -# Add type predicate -function isString(val) { return getType(val) == T_STRING } - -# String value constructor -function makeString(val) { - return T_STRING ":" val -} -``` - -**Changes Required:** -- Add `T_STRING` constant in BEGIN block -- Add `isString()` predicate function -- Add `makeString()` constructor function -- Update type checking in existing operations where needed - -#### 1.2 String Storage Strategy -**Options:** -1. **Inline storage**: Store strings directly in the type:value format - - Pros: Simple, fast access - - Cons: Limited by awk string length, no escaping issues - -2. **Heap storage**: Store strings in heap like cons cells - - Pros: Unlimited length, consistent with existing patterns - - Cons: More complex, requires string management - -**Recommendation**: Start with inline storage for simplicity, can migrate to heap later. - -### Phase 2: Compiler Extensions - -#### 2.1 Lexer String Support -**File: `bin/compiler.awk`** - -**Current lexer handles:** -- Numbers: `-?[0-9]+` -- Symbols: `[a-zA-Z_][a-zA-Z0-9_]*` -- Parentheses: `(`, `)` - -**New string tokenization:** -```awk -# Add to next_token() function -# Handle string literals (double quotes) -if (c == "\"") { - str = "" - input_buffer = substr(input_buffer, 2) # Skip opening quote - - while (length(input_buffer) > 0) { - c = substr(input_buffer, 1, 1) - if (c == "\"") { - input_buffer = substr(input_buffer, 2) # Skip closing quote - break - } - if (c == "\\") { - # Handle escape sequences - input_buffer = substr(input_buffer, 2) - if (length(input_buffer) > 0) { - c = substr(input_buffer, 1, 1) - if (c == "n") str = str "\n" - else if (c == "t") str = str "\t" - else if (c == "\\") str = str "\\" - else if (c == "\"") str = str "\"" - else error("Invalid escape sequence: \\" c) - input_buffer = substr(input_buffer, 2) - } - } else { - str = str c - input_buffer = substr(input_buffer, 2) - } - } - return "\"" str "\"" # Return with quotes for identification -} -``` - -**Escape Sequences to Support:** -- `\"` - Literal double quote -- `\\` - Literal backslash -- `\n` - Newline -- `\t` - Tab -- `\r` - Carriage return - -#### 2.2 Parser String Support -**File: `bin/compiler.awk`** - -**Update `parse_expr()`:** -```awk -function parse_expr(token, result) { - token = next_token() - if (token == "EOF") return "" - - if (token == "(") { - result = parse_list() - debug("Parsed list: " result) - return result - } - - # Handle string literals - if (substr(token, 1, 1) == "\"") { - debug("Parsed string: " token) - return token - } - - debug("Parsed token: " token) - return token -} -``` - -#### 2.3 Code Generation for Strings -**File: `bin/compiler.awk`** - -**Add to `compile_expr()`:** -```awk -function compile_expr(expr, split_result, op, args) { - debug("Compiling expression: " expr) - - # Handle string literals - if (substr(expr, 1, 1) == "\"") { - compile_string(expr) - return - } - - # ... existing code for numbers, nil, variables, etc. -} - -function compile_string(str) { - debug("Compiling string: " str) - # Remove outer quotes and emit string constant - content = substr(str, 2, length(str) - 2) - print "PUSH_CONST STR:" content -} -``` - -### Phase 3: VM String Operations - -#### 3.1 String Built-in Functions -**File: `bin/vm.awk`** - -**Add to BEGIN block (built-in registration):** -```awk -# String operations -FUNCTIONS["string-length"] = "string_length" -FUNCTIONS["string-append"] = "string_append" -FUNCTIONS["string-ref"] = "string_ref" -FUNCTIONS["substring"] = "substring" -FUNCTIONS["string=?"] = "string_equal" -FUNCTIONS["string<?"] = "string_less_than" -FUNCTIONS["string>?"] = "string_greater_than" -FUNCTIONS["string-ci=?"] = "string_ci_equal" -FUNCTIONS["string-ci<?"] = "string_ci_less_than" -FUNCTIONS["string-ci>?"] = "string_ci_greater_than" -``` - -**Add to function call dispatch in `vm_call_function()`:** -```awk -} else if (built_in_name == "string_length") { - debug("Calling built-in function: string_length") - string_length() - return -} else if (built_in_name == "string_append") { - debug("Calling built-in function: string_append") - string_append() - return -# ... etc for other string functions -``` - -#### 3.2 String Function Implementations -**File: `bin/vm.awk`** - -```awk -# String length -function string_length() { - if (stack_ptr < 1) error("string-length requires one operand") - val = pop() - if (!isString(val)) error("string-length requires string operand") - str = getValue(val) - push(makeValue(T_NUMBER, length(str))) -} - -# String concatenation -function string_append() { - if (stack_ptr < 2) error("string-append requires at least two operands") - result = "" - # Pop all arguments and concatenate - while (stack_ptr > 0) { - val = pop() - if (!isString(val)) error("string-append requires string operands") - result = getValue(val) result - } - push(makeString(result)) -} - -# String character access (0-indexed) -function string_ref() { - if (stack_ptr < 2) error("string-ref requires two operands") - index_val = pop() - str_val = pop() - if (!isNumber(index_val)) error("string-ref index must be a number") - if (!isString(str_val)) error("string-ref requires string operand") - - str = getValue(str_val) - idx = getValue(index_val) + 1 # Convert to 1-indexed - if (idx < 1 || idx > length(str)) error("string-ref index out of bounds") - - char = substr(str, idx, 1) - push(makeString(char)) -} - -# Substring extraction -function substring() { - if (stack_ptr < 3) error("substring requires three operands") - end = pop() - start = pop() - str_val = pop() - - if (!isNumber(start) || !isNumber(end)) error("substring indices must be numbers") - if (!isString(str_val)) error("substring requires string operand") - - str = getValue(str_val) - start_idx = getValue(start) + 1 # Convert to 1-indexed - end_idx = getValue(end) + 1 - - if (start_idx < 1 || end_idx > length(str) || start_idx > end_idx) { - error("substring indices out of bounds") - } - - result = substr(str, start_idx, end_idx - start_idx + 1) - push(makeString(result)) -} - -# String comparison (case-sensitive) -function string_equal() { - if (stack_ptr < 2) error("string=? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string=? requires string operands") - result = (getValue(val1) == getValue(val2)) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} - -function string_less_than() { - if (stack_ptr < 2) error("string<? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string<? requires string operands") - result = (getValue(val1) < getValue(val2)) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} - -function string_greater_than() { - if (stack_ptr < 2) error("string>? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string>? requires string operands") - result = (getValue(val1) > getValue(val2)) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} - -# Case-insensitive string comparison -function string_ci_equal() { - if (stack_ptr < 2) error("string-ci=? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string-ci=? requires string operands") - # Convert to lowercase for comparison - str1 = tolower(getValue(val1)) - str2 = tolower(getValue(val2)) - result = (str1 == str2) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} - -function string_ci_less_than() { - if (stack_ptr < 2) error("string-ci<? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string-ci<? requires string operands") - str1 = tolower(getValue(val1)) - str2 = tolower(getValue(val2)) - result = (str1 < str2) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} - -function string_ci_greater_than() { - if (stack_ptr < 2) error("string-ci>? requires two operands") - val2 = pop() - val1 = pop() - if (!isString(val1) || !isString(val2)) error("string-ci>? requires string operands") - str1 = tolower(getValue(val1)) - str2 = tolower(getValue(val2)) - result = (str1 > str2) ? "1" : "0" - push(makeValue(T_BOOLEAN, result)) -} -``` - -### Phase 4: Enhanced String Operations - -#### 4.1 Additional String Functions -```awk -# String to number conversion -function string_to_number() { - if (stack_ptr < 1) error("string->number requires one operand") - val = pop() - if (!isString(val)) error("string->number requires string operand") - str = getValue(val) - if (str ~ /^-?[0-9]+$/) { - push(makeValue(T_NUMBER, str)) - } else { - push(makeValue(T_BOOLEAN, "0")) # Return false for invalid numbers - } -} - -# Number to string conversion -function number_to_string() { - if (stack_ptr < 1) error("number->string requires one operand") - val = pop() - if (!isNumber(val)) error("number->string requires number operand") - num = getValue(val) - push(makeString(num)) -} - -# String splitting -function string_split() { - if (stack_ptr < 2) error("string-split requires two operands") - delimiter = pop() - str_val = pop() - if (!isString(delimiter) || !isString(str_val)) { - error("string-split requires string operands") - } - - str = getValue(str_val) - delim = getValue(delimiter) - - # Use awk's split function - split(str, parts, delim) - result = "" - for (i = 1; i <= length(parts); i++) { - if (result != "") result = result " " - result = result "\"" parts[i] "\"" - } - push(makeString("(" result ")")) -} -``` - -#### 4.2 String Formatting -```awk -# String formatting (simple version) -function format() { - if (stack_ptr < 2) error("format requires at least two operands") - format_str = pop() - if (!isString(format_str)) error("format requires string format operand") - - # Simple implementation - replace ~a with arguments - fmt = getValue(format_str) - arg_count = 0 - - while (stack_ptr > 0) { - val = pop() - arg_count++ - # Replace ~a with the value - gsub(/~a/, val, fmt) - } - - push(makeString(fmt)) -} -``` - -### Phase 5: State Persistence for Strings - -#### 5.1 String State Management -**File: `bin/vm.awk`** - -**Update `save_state()` and state loading:** -- Strings stored in environment will be automatically persisted -- No special handling needed since strings use inline storage -- String literals in function definitions will be preserved - -**Considerations:** -- Long strings may impact state file size -- Escape sequences need proper handling in state files -- String encoding consistency across sessions - -### Phase 6: Testing and Validation - -#### 6.1 Test Cases -**Create test files:** - -```scheme -; Basic string literals -"hello world" -"" -"\"quoted\"" -"line1\nline2" - -; String operations -(string-length "hello") -(string-append "hello" " " "world") -(string-ref "hello" 0) -(substring "hello world" 0 4) - -; String comparisons -(string=? "hello" "hello") -(string=? "hello" "world") -(string<? "abc" "def") -(string>? "xyz" "abc") - -; Case-insensitive comparisons -(string-ci=? "Hello" "hello") -(string-ci<? "ABC" "def") - -; Conversions -(string->number "123") -(number->string 456) -(string->number "abc") - -; String in expressions -(define greeting "Hello") -(define name "World") -(string-append greeting " " name) - -; Strings in functions -(define (greet name) (string-append "Hello, " name "!")) -(greet "Alice") -``` - -#### 6.2 Edge Cases to Test -- Empty strings -- Strings with special characters -- Escape sequences -- Very long strings -- String operations on non-string types -- String comparisons with different encodings -- String persistence across REPL sessions - -### Phase 7: Performance and Optimization - -#### 7.1 Performance Considerations -- **String concatenation**: Current `string-append` is O(n²) - consider optimization -- **String storage**: Monitor memory usage with large strings -- **String comparisons**: Consider early termination for long strings -- **Escape sequence processing**: Optimize lexer for common cases - -#### 7.2 Potential Optimizations -```awk -# Optimized string concatenation (build in reverse) -function string_append_optimized() { - if (stack_ptr < 2) error("string-append requires at least two operands") - - # Count arguments and pre-allocate - arg_count = 0 - temp_stack[0] = stack_ptr - - # Pop all arguments to temp storage - while (stack_ptr > 0) { - val = pop() - if (!isString(val)) error("string-append requires string operands") - temp_stack[++arg_count] = getValue(val) - } - - # Concatenate in reverse order (more efficient) - result = "" - for (i = arg_count; i >= 1; i--) { - result = result temp_stack[i] - } - - push(makeString(result)) -} -``` - -### Phase 8: Documentation and Examples - -#### 8.1 Update README.md -- Add string data type to type system documentation -- Include string usage examples -- Document all string built-in functions -- Add string limitations and considerations - -#### 8.2 Update diagram.md -- Add string type to data type diagram -- Include string operations in VM instruction set -- Show string flow in execution examples - -### Implementation Order and Risk Assessment - -#### High Priority (Core Functionality) -1. **Type system extensions** - Low risk, foundational -2. **Lexer string support** - Medium risk, affects parsing -3. **Basic string operations** - Low risk, self-contained -4. **String literals in expressions** - Medium risk, integration - -#### Medium Priority (Enhanced Features) -5. **String comparisons** - Low risk, straightforward -6. **String conversions** - Low risk, utility functions -7. **State persistence** - Low risk, automatic - -#### Lower Priority (Advanced Features) -8. **String formatting** - Medium risk, complex parsing -9. **String splitting** - Low risk, utility function -10. **Performance optimizations** - Low risk, optional - -#### Risk Mitigation -- **Regression testing**: Ensure existing functionality unchanged -- **Incremental implementation**: Add features one at a time -- **Comprehensive testing**: Test all edge cases -- **Backward compatibility**: Maintain existing API - -### Success Criteria -1. String literals parse and compile correctly -2. Basic string operations work as expected -3. String state persists across REPL sessions -4. No regression in existing functionality -5. Performance remains acceptable -6. Documentation is complete and accurate - -### Future Enhancements -1. **Heap-based string storage** for very long strings -2. **String interning** for memory efficiency -3. **Unicode support** (complex, requires significant changes) -4. **Regular expression support** for string matching -5. **String formatting with more features** (like Common Lisp format) -6. **String streams** for efficient string building - -This plan provides a comprehensive roadmap for implementing string support while maintaining the existing architecture and ensuring backward compatibility. \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/random.txt b/awk/scheme/scheme/scratch/random.txt deleted file mode 100644 index 20b3bcf..0000000 --- a/awk/scheme/scheme/scratch/random.txt +++ /dev/null @@ -1,112 +0,0 @@ -Other Uses for the Awk-Scheme VM -================================ - -The stack-based VM in Awk-Scheme is surprisingly versatile. Here are some realistic alternatives that could be implemented using the same VM architecture: - -## 1. Forth Implementation -**Feasibility: High** - -Forth is a natural fit since it's already stack-based: -- **Stack Operations**: VM already has PUSH_CONST, POP, DUP, SWAP -- **Arithmetic**: ADD, SUB, MUL, DIV are perfect for Forth -- **Memory**: Could extend heap for Forth's memory model -- **Control Flow**: Would need to add conditional jumps and loops -- **Words**: Could map Forth words to VM functions - -**Required Extensions:** -- JMP, JZ (jump if zero), JNZ instructions -- More stack operations (OVER, ROT, etc.) -- Memory read/write operations -- Input/output operations - -## 2. Simple Calculator Language -**Feasibility: Very High** - -A basic calculator with variables and functions: -- **Syntax**: `x = 5; y = x + 3; print y` -- **Features**: Variables, basic math, simple functions -- **VM Fit**: Perfect - already has arithmetic, variables, functions - -**Minimal Changes Needed:** -- New parser for calculator syntax -- Assignment operator handling -- Print statement - -## 3. Configuration Language -**Feasibility: High** - -A simple config language for key-value pairs and nested structures: -- **Syntax**: `server { port = 8080; host = "localhost"; }` -- **Features**: Nested objects, arrays, basic expressions -- **VM Fit**: Good - can use cons cells for nested structures - -**Required Extensions:** -- String support -- Object/struct creation -- Field access operations - -## 4. Simple Scripting Language -**Feasibility: Medium** - -A basic scripting language with loops and conditionals: -- **Syntax**: `if x > 0 { y = x * 2; } else { y = 0; }` -- **Features**: Variables, conditionals, loops, functions -- **VM Fit**: Good but needs control flow - -**Required Extensions:** -- Conditional jumps -- Loop constructs -- Boolean logic operations - -## 5. Data Processing Language -**Feasibility: Medium** - -A language for simple data transformations: -- **Syntax**: `filter(x > 0) | map(x * 2) | sum()` -- **Features**: Pipeline operations, list processing -- **VM Fit**: Good - can use cons cells for lists - -**Required Extensions:** -- List operations (map, filter, reduce) -- Pipeline operator -- More built-in functions - -## 6. Simple Logic/Constraint Language -**Feasibility: High** - -A language for expressing simple constraints and rules: -- **Syntax**: `rule: if age > 18 then can_vote = true` -- **Features**: Rules, facts, simple inference -- **VM Fit**: Good - boolean operations and variables - -**Required Extensions:** -- Boolean logic (AND, OR, NOT) -- Rule evaluation -- Fact storage - -## VM Strengths for Alternative Languages: -1. **Stack-based**: Natural for many languages -2. **Typed values**: Good foundation for type safety -3. **Environment model**: Supports variables and scoping -4. **Function system**: Reusable for different syntaxes -5. **Extensible**: Easy to add new instructions - -## VM Limitations for Alternative Languages: -1. **Limited data types**: Only numbers, booleans, pairs, symbols -2. **No strings**: Would need to add string support -3. **Limited control flow**: Only function calls, no loops/conditionals -4. **No I/O**: Would need file/console operations -5. **Memory constraints**: Simple heap model - -## Most Realistic Next Steps: -1. **Forth**: Natural progression, leverages existing stack operations -2. **Calculator**: Minimal changes, good learning exercise -3. **Config Language**: Practical use case, moderate complexity - -## Implementation Strategy: -1. Keep the VM unchanged -2. Write new compilers for different syntaxes -3. Add minimal VM extensions as needed -4. Reuse existing VM infrastructure (stack, environment, functions) - -The VM's simplicity is actually a strength - it's easy to understand and extend, making it a good foundation for experimenting with different language designs. \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/run.sh b/awk/scheme/scheme/scratch/run.sh deleted file mode 100755 index 0afdb41..0000000 --- a/awk/scheme/scheme/scratch/run.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/bin/bash -# Compile Scheme to VM instructions -awk -f compiler.awk test.scm > test.asm -# Run VM instructions -awk -f vm.awk test.asm \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/test.asm b/awk/scheme/scheme/scratch/test.asm deleted file mode 100644 index 8e7d8df..0000000 --- a/awk/scheme/scheme/scratch/test.asm +++ /dev/null @@ -1,16 +0,0 @@ -PUSH_CONST N:1 -PUSH_CONST N:2 -ADD -PUSH_CONST N:3 -PUSH_CONST N:4 -MUL -PUSH_CONST N:10 -PUSH_CONST N:2 -PUSH_CONST N:3 -ADD -SUB -PUSH_CONST NIL: -CONS -CONS -CONS -HALT diff --git a/awk/scheme/scheme/scratch/test.scm b/awk/scheme/scheme/scratch/test.scm deleted file mode 100644 index a01b174..0000000 --- a/awk/scheme/scheme/scratch/test.scm +++ /dev/null @@ -1,8 +0,0 @@ -;; Build a list of calculated values -(cons (+ 1 2) ; First element: 1 + 2 = 3 - (cons (* 3 4) ; Second element: 3 * 4 = 12 - (cons (- 10 - (+ 2 3)) ; Third element: 10 - (2 + 3) = 5 - nil))) ; End of list - -;; This should create a list: (3 12 5) \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/test.scm.asm b/awk/scheme/scheme/scratch/test.scm.asm deleted file mode 100644 index 526e2b1..0000000 --- a/awk/scheme/scheme/scratch/test.scm.asm +++ /dev/null @@ -1,7 +0,0 @@ -PUSH_CONST N:5 -PUSH_CONST N:3 -ADD -PUSH_CONST N:2 -MUL -PRINT # should output N:16 -HALT \ No newline at end of file diff --git a/awk/scheme/scheme/test/README.md b/awk/scheme/scheme/test/README.md new file mode 100644 index 0000000..7106cb3 --- /dev/null +++ b/awk/scheme/scheme/test/README.md @@ -0,0 +1,144 @@ +# Test Suite Documentation + +## Overview +The test suite is organized into four categories to ensure comprehensive coverage of the Scheme implementation: + +## Test Categories + +### Unit Tests (`test/unit/`) +Focused tests for individual language features and components: + +#### Core Language Features +- `basic_arithmetic.scm` - Basic arithmetic operations (+, -, *, /) +- `essential_numeric_operations.scm` - Core numeric functions (abs, modulo, expt) +- `comprehensive_numeric_and_type_operations.scm` - Full numeric and type predicate testing +- `booleans.scm` - Boolean operations and logic +- `strings.scm` - String operations and manipulation +- `lists.scm` - Basic list operations (cons, car, cdr) +- `variables.scm` - Variable binding and lookup +- `functions.scm` - Function definition and calling +- `lambdas.scm` - Lambda function creation and execution +- `closures.scm` - Closure environment capture +- `let.scm` - Let binding expressions +- `control_flow.scm` - Basic control flow (if, cond) +- `advanced_control_flow.scm` - Complex control flow patterns +- `advanced_functions.scm` - Advanced function features + +#### Higher-Order Functions +- `basic_map_operations.scm` - Basic map function testing +- `basic_higher_order_functions.scm` - Map and filter with built-in functions +- `comprehensive_higher_order_functions.scm` - Full map/filter with all function types +- `call_stack_infrastructure.scm` - Call stack and nested function support + +#### Standard Library +- `basic_stdlib_test.scm` - Basic standard library functions +- `simple_stdlib_test.scm` - Simple standard library usage +- `stdlib_lists.scm` - List utility functions +- `stdlib_results_test.scm` - Standard library result validation +- `additional_lists.scm` - Additional list operations +- `simple_additional_lists.scm` - Simple list operations + +#### Error Handling and Edge Cases +- `error_handling.scm` - Error condition testing +- `performance_stress.scm` - Performance and stress testing +- `vm_instructions.scm` - VM instruction validation + +#### Debug and Development +- `debug_*.scm` - Various debug and development tests + +### Integration Tests (`test/integration/`) +Tests that combine multiple language features in realistic scenarios: + +- `basic_arithmetic.scm` - Arithmetic operations in context +- `list_operations.scm` - List operations with other features +- `string_operations.scm` - String operations in context +- `function_composition.scm` - Function composition patterns +- `complex_programs.scm` - Complex multi-feature programs +- `higher_order_functions.scm` - Higher-order functions with complex scenarios + +### Regression Tests (`test/regression/`) +Tests to prevent regressions in previously fixed issues: + +- `edge_cases.scm` - Edge case handling +- `nested_expressions.scm` - Complex nested expression evaluation + +### Examples (`test/examples/`) +Example programs demonstrating language capabilities: + +- `calculator.scm` - Simple calculator implementation +- `list_utils.scm` - List utility functions +- `string_utils.scm` - String utility functions + +## Test Naming Conventions + +### Unit Tests +- `basic_*.scm` - Basic functionality testing +- `essential_*.scm` - Core language features +- `comprehensive_*.scm` - Full feature testing +- `advanced_*.scm` - Advanced language features +- `simple_*.scm` - Simple, focused tests +- `debug_*.scm` - Debug and development tests + +### Integration Tests +- `*_operations.scm` - Operations in context +- `*_composition.scm` - Feature composition +- `complex_*.scm` - Complex multi-feature scenarios +- `higher_order_*.scm` - Higher-order function scenarios + +## Running Tests + +```bash +# Run all tests +./test/run_tests.sh + +# Run specific test category +./test/run_tests.sh unit +./test/run_tests.sh integration +./test/run_tests.sh regression +./test/run_tests.sh examples + +# Run individual test +./bin/compiler.awk test/unit/basic_arithmetic.scm | ./bin/vm.awk +``` + +## Test Coverage + +The test suite covers: + +### Core Language Features +- ✅ Arithmetic operations (+, -, *, /, modulo, expt, abs, min, max) +- ✅ Type predicates (number?, string?, boolean?, symbol?, zero?, positive?, negative?) +- ✅ List operations (cons, car, cdr, length, append, reverse, member) +- ✅ String operations (string-length, string-append, string-ref, substring, string comparisons) +- ✅ Variable binding (define, let) +- ✅ Function definition and calling +- ✅ Lambda functions and closures +- ✅ Control flow (if, cond, and, or, not) + +### Higher-Order Functions +- ✅ Map function with all function types (built-in, lambda, user-defined, closures) +- ✅ Filter function with all function types +- ✅ Nested higher-order function calls +- ✅ Complex data processing pipelines + +### Standard Library +- ✅ List utilities (null?, pair?, cadr, caddr, list-ref, list-tail) +- ✅ String utilities (all string operations) +- ✅ Control flow utilities (all control flow constructs) + +### Integration Scenarios +- ✅ Complex programs combining multiple features +- ✅ Function composition patterns +- ✅ Data processing pipelines +- ✅ Higher-order function scenarios + +## Adding New Tests + +When adding new tests: + +1. **Unit Tests**: Focus on a single feature or component +2. **Integration Tests**: Combine multiple features in realistic scenarios +3. **Regression Tests**: Prevent specific bugs from recurring +4. **Examples**: Demonstrate language capabilities + +Use descriptive names that indicate the test's purpose and scope. \ No newline at end of file diff --git a/awk/scheme/scheme/test/examples/calculator.scm b/awk/scheme/scheme/test/examples/calculator.scm new file mode 100644 index 0000000..43895ff --- /dev/null +++ b/awk/scheme/scheme/test/examples/calculator.scm @@ -0,0 +1,14 @@ +; Function definitions and arithmetic + +(define add (x y) (+ x y)) +(define subtract (x y) (- x y)) +(define multiply (x y) (* x y)) +(define divide (x y) (/ x y)) + +(add 10 5) +(subtract 10 5) +(multiply 10 5) +(divide 10 5) + +; Chain operations +(add (multiply 3 4) (divide 10 2)) \ No newline at end of file diff --git a/awk/scheme/scheme/test/examples/list_utils.scm b/awk/scheme/scheme/test/examples/list_utils.scm new file mode 100644 index 0000000..19d7299 --- /dev/null +++ b/awk/scheme/scheme/test/examples/list_utils.scm @@ -0,0 +1,17 @@ +; List operations and function definitions + +(define make-pair (x y) (cons x y)) +(define make-triple (x y z) (cons x (cons y (cons z nil)))) +(define first (lst) (car lst)) +(define second (lst) (car (cdr lst))) +(define third (lst) (car (cdr (cdr lst)))) +(define rest (lst) (cdr lst)) + +(make-pair 1 2) +(first (make-pair 1 2)) +(rest (make-pair 1 2)) + +(make-triple 1 2 3) +(first (make-triple 1 2 3)) +(second (make-triple 1 2 3)) +(third (make-triple 1 2 3)) \ No newline at end of file diff --git a/awk/scheme/scheme/test/examples/string_utils.scm b/awk/scheme/scheme/test/examples/string_utils.scm new file mode 100644 index 0000000..6973922 --- /dev/null +++ b/awk/scheme/scheme/test/examples/string_utils.scm @@ -0,0 +1,16 @@ +; String operations and function composition + +(define greet (name) (string-append "Hello, " name "!")) +(define shout (message) (string-append message "!!!")) +(define count-chars (text) (string-length text)) +(define first-char (text) (string-ref text 0)) +(define last-char (text) (string-ref text (- (string-length text) 1))) + +(greet "World") +(shout "Hello") +(count-chars "Hello World") +(first-char "Hello") +(last-char "Hello") + +; Compose operations +(shout (greet "Alice")) \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/basic_arithmetic.scm b/awk/scheme/scheme/test/integration/basic_arithmetic.scm new file mode 100644 index 0000000..f2ee0ca --- /dev/null +++ b/awk/scheme/scheme/test/integration/basic_arithmetic.scm @@ -0,0 +1,10 @@ +; Strive to tests compilation and the execution of arithmetic expressions + +(define a 10) +(define b 5) +(+ a b) +(- a b) +(* a b) +(/ a b) +(define result (+ (* a b) (- a b))) +result \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/complex_programs.scm b/awk/scheme/scheme/test/integration/complex_programs.scm new file mode 100644 index 0000000..4ea5aa1 --- /dev/null +++ b/awk/scheme/scheme/test/integration/complex_programs.scm @@ -0,0 +1,86 @@ +;; Complex Program Integration Tests +;; Tests that combine multiple language features in realistic scenarios + +;; Simple calculator with multiple operations +(define calculator (op a b) + (cond ((= op "add") (+ a b)) + ((= op "sub") (- a b)) + ((= op "mul") (* a b)) + ((= op "div") (/ a b)) + (else "unknown operation"))) + +(calculator "add" 5 3) +(calculator "sub" 10 4) +(calculator "mul" 6 7) +(calculator "div" 15 3) + +;; List processing pipeline +(define process-list (lst) + (if (null? lst) + nil + (cons (* (car lst) 2) + (process-list (cdr lst))))) + +(define filter-even (lst) + (if (null? lst) + nil + (if (= (modulo (car lst) 2) 0) + (cons (car lst) (filter-even (cdr lst))) + (filter-even (cdr lst))))) + +(define sample-list (cons 1 (cons 2 (cons 3 (cons 4 (cons 5 nil)))))) +(process-list sample-list) +(filter-even sample-list) + +;; String processing utilities +(define string-reverse (str) + (define reverse-helper (str idx) + (if (< idx 0) + "" + (string-append (substring str idx (+ idx 1)) + (reverse-helper str (- idx 1))))) + (reverse-helper str (- (string-length str) 1))) + +(string-reverse "hello") +(string-reverse "racecar") + +;; Higher-order list operations +(define map (f lst) + (if (null? lst) + nil + (cons (f (car lst)) + (map f (cdr lst))))) + +(define square (x) (* x x)) +(map square (cons 1 (cons 2 (cons 3 nil)))) + +;; Complex nested data structures +(define make-person (name age) + (cons name (cons age nil))) + +(define person-name (person) (car person)) +(define person-age (person) (car (cdr person))) + +(define alice (make-person "Alice" 25)) +(define bob (make-person "Bob" 30)) +(define people (cons alice (cons bob nil))) + +(person-name alice) +(person-age bob) +(length people) + +;; Recursive tree-like structure +(define make-tree (value left right) + (cons value (cons left (cons right nil)))) + +(define tree-value (tree) (car tree)) +(define tree-left (tree) (car (cdr tree))) +(define tree-right (tree) (car (cdr (cdr tree)))) + +(define sample-tree (make-tree 5 + (make-tree 3 nil nil) + (make-tree 7 nil nil))) + +(tree-value sample-tree) +(tree-value (tree-left sample-tree)) +(tree-value (tree-right sample-tree)) \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/function_composition.scm b/awk/scheme/scheme/test/integration/function_composition.scm new file mode 100644 index 0000000..7009888 --- /dev/null +++ b/awk/scheme/scheme/test/integration/function_composition.scm @@ -0,0 +1,9 @@ +; Defining and using functions together + +(define add (x y) (+ x y)) +(define multiply (x y) (* x y)) +(define square (x) (multiply x x)) +(define add-square (x y) (add (square x) (square y))) +(add-square 3 4) +(define result (add-square 5 12)) +result \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/higher_order_functions.scm b/awk/scheme/scheme/test/integration/higher_order_functions.scm new file mode 100644 index 0000000..99d40e9 --- /dev/null +++ b/awk/scheme/scheme/test/integration/higher_order_functions.scm @@ -0,0 +1,55 @@ +;; Integration test for higher-order functions with complex scenarios + +;; Test 1: Data processing pipeline +(define numbers (list 1 2 3 4 5 6 7 8 9 10)) + +;; Filter even numbers, then double them +(define even-doubled (map (lambda (x) (* x 2)) + (filter (lambda (x) (= (modulo x 2) 0)) numbers))) +(assert (= (length even-doubled) 5)) +(assert (= (car even-doubled) 4)) ; 2 * 2 +(assert (= (cadr even-doubled) 8)) ; 4 * 2 +(assert (= (caddr even-doubled) 12)) ; 6 * 2 + +;; Test 2: Complex predicate composition +(define between-5-and-15 (lambda (x) (and (> x 5) (< x 15)))) +(define add-10 (lambda (x) (+ x 10))) +(define filtered-and-transformed (map add-10 + (filter between-5-and-15 numbers))) +(assert (= (length filtered-and-transformed) 4)) +(assert (= (car filtered-and-transformed) 16)) ; 6 + 10 +(assert (= (cadr filtered-and-transformed) 17)) ; 7 + 10 + +;; Test 3: Multiple transformation stages +(define pipeline-result (map (lambda (x) (* x 3)) + (map (lambda (x) (+ x 1)) + (filter (lambda (x) (> x 5)) numbers)))) +(assert (= (length pipeline-result) 5)) +(assert (= (car pipeline-result) 21)) ; (6 + 1) * 3 +(assert (= (cadr pipeline-result) 24)) ; (7 + 1) * 3 + +;; Test 4: Function composition with user-defined functions +(define square (lambda (x) (* x x))) +(define squares-of-odds (map square + (filter (lambda (x) (= (modulo x 2) 1)) numbers))) +(assert (= (length squares-of-odds) 5)) +(assert (= (car squares-of-odds) 1)) ; 1^2 +(assert (= (cadr squares-of-odds) 9)) ; 3^2 +(assert (= (caddr squares-of-odds) 25)) ; 5^2 + +;; Test 5: Nested higher-order functions +(define make-filter (lambda (predicate) + (lambda (list) (filter predicate list)))) + +(define make-mapper (lambda (transform) + (lambda (list) (map transform list)))) + +(define filter-evens (make-filter (lambda (x) (= (modulo x 2) 0)))) +(define double-all (make-mapper (lambda (x) (* x 2)))) + +(define result (double-all (filter-evens numbers))) +(assert (= (length result) 5)) +(assert (= (car result) 4)) ; 2 * 2 +(assert (= (cadr result) 8)) ; 4 * 2 + +(display "Higher-order functions integration test passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/list_operations.scm b/awk/scheme/scheme/test/integration/list_operations.scm new file mode 100644 index 0000000..3a82919 --- /dev/null +++ b/awk/scheme/scheme/test/integration/list_operations.scm @@ -0,0 +1,10 @@ +; List operations and function definitions + +(define make-list (x y z) (cons x (cons y (cons z nil)))) +(define first (lst) (car lst)) +(define second (lst) (car (cdr lst))) +(define third (lst) (car (cdr (cdr lst)))) +(define my-list (make-list 1 2 3)) +(first my-list) +(second my-list) +(third my-list) \ No newline at end of file diff --git a/awk/scheme/scheme/test/integration/string_operations.scm b/awk/scheme/scheme/test/integration/string_operations.scm new file mode 100644 index 0000000..72a19d9 --- /dev/null +++ b/awk/scheme/scheme/test/integration/string_operations.scm @@ -0,0 +1,9 @@ +; String operations and variable bindings + +(define greeting "Hello") +(define name "World") +(string-append greeting ", " name "!") +(define message (string-append greeting ", " name "!")) +(string-length message) +(string-ref message 0) +(substring message 0 5) \ No newline at end of file diff --git a/awk/scheme/scheme/test/regression/edge_cases.scm b/awk/scheme/scheme/test/regression/edge_cases.scm new file mode 100644 index 0000000..7794ede --- /dev/null +++ b/awk/scheme/scheme/test/regression/edge_cases.scm @@ -0,0 +1,21 @@ +; Tests that might catch regressions ¯\_༼ ಠ_ಠ ༽_/¯ + +; Empty string +(string-length "") +(string-append "" "") + +; Single character +(string-length "a") +(string-ref "a" 0) + +; Zero and negative numbers +(+ 0 0) +(- 0) +(* 0 5) +(/ 0 5) + +; Nil +nil +(cons 1 nil) +(car (cons 1 nil)) +(cdr (cons 1 nil)) \ No newline at end of file diff --git a/awk/scheme/scheme/test/regression/nested_expressions.scm b/awk/scheme/scheme/test/regression/nested_expressions.scm new file mode 100644 index 0000000..c564f29 --- /dev/null +++ b/awk/scheme/scheme/test/regression/nested_expressions.scm @@ -0,0 +1,15 @@ +; Nested expressions that have caused issues before + +(+ (+ (+ 1 2) (+ 3 4)) (+ (+ 5 6) (+ 7 8))) + +(define add (x y) (+ x y)) +(define double (x) (* x 2)) +(double (add 3 4)) + +(let ((x 10)) + (let ((y 20)) + (let ((z 30)) + (+ x (+ y z))))) + +(string-append "a" (string-append "b" "c")) +(string-length (string-append "hello" " banana")) \ No newline at end of file diff --git a/awk/scheme/scheme/test/run_tests.sh b/awk/scheme/scheme/test/run_tests.sh new file mode 100755 index 0000000..4e7d181 --- /dev/null +++ b/awk/scheme/scheme/test/run_tests.sh @@ -0,0 +1,179 @@ +#!/bin/bash + +# Awk-Scheme Test Runner +# Runs all tests and reports results + +# set -e # Exit on any error - commented out to allow all tests to run + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Test counters +TOTAL_TESTS=0 +PASSED_TESTS=0 +FAILED_TESTS=0 +FAILED_TEST_NAMES=() + +# Get the project root directory +PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +SCHEME_BIN="$PROJECT_ROOT/scheme" + +# Debug flag +DEBUG=${DEBUG:-0} + +# Function to print colored output +print_status() { + local status=$1 + local message=$2 + case $status in + "PASS") + echo -e "${GREEN}✓ PASS${NC}: $message" + ;; + "FAIL") + echo -e "${RED}✗ FAIL${NC}: $message" + ;; + "INFO") + echo -e "${BLUE}ℹ INFO${NC}: $message" + ;; + "WARN") + echo -e "${YELLOW}⚠ WARN${NC}: $message" + ;; + esac +} + +# Function to run a single test +run_test() { + local test_file=$1 + local test_name=$(basename "$test_file" .scm) + local rel_test_file=${test_file#$PROJECT_ROOT/} + + TOTAL_TESTS=$((TOTAL_TESTS + 1)) + + print_status "INFO" "Running test: $test_name" + + # Check if test file exists + if [ ! -f "$test_file" ]; then + print_status "FAIL" "Test file not found: $test_file" + FAILED_TESTS=$((FAILED_TESTS + 1)) + return 1 + fi + + # Run the test + local output + local exit_code + + if [ "$DEBUG" = "1" ]; then + # Run with debug output + output=$(DEBUG=1 "$SCHEME_BIN" "$test_file" 2>&1) + exit_code=$? + else + # Run normally + output=$("$SCHEME_BIN" "$test_file" 2>&1) + exit_code=$? + fi + + # Check if test passed (exit code 0) + if [ $exit_code -eq 0 ]; then + print_status "PASS" "$test_name" + PASSED_TESTS=$((PASSED_TESTS + 1)) + + if [ "$DEBUG" = "1" ]; then + echo "Output:" + echo "$output" | sed 's/^/ /' + echo + fi + else + print_status "FAIL" "$test_name (exit code: $exit_code)" + FAILED_TESTS=$((FAILED_TESTS + 1)) + FAILED_TEST_NAMES+=("$rel_test_file") + + echo "Error output:" + echo "$output" | sed 's/^/ /' + echo + fi +} + +# Function to run all tests in a directory +run_test_directory() { + local dir=$1 + local dir_name=$(basename "$dir") + + echo + print_status "INFO" "Running tests in: $dir_name" + echo "==================================" + + if [ ! -d "$dir" ]; then + print_status "WARN" "Directory not found: $dir" + return + fi + + # Find all .scm files in the directory + local test_files=($(find "$dir" -name "*.scm" -type f | sort)) + + if [ ${#test_files[@]} -eq 0 ]; then + print_status "WARN" "No test files found in $dir" + return + fi + + for test_file in "${test_files[@]}"; do + run_test "$test_file" + done +} + +# Function to print summary +print_summary() { + echo + echo "==================================" + print_status "INFO" "Test Summary" + echo "==================================" + echo "Total tests: $TOTAL_TESTS" + echo -e "Passed: ${GREEN}$PASSED_TESTS${NC}" + echo -e "Failed: ${RED}$FAILED_TESTS${NC}" + + if [ $FAILED_TESTS -eq 0 ]; then + echo -e "${GREEN}All tests passed!${NC}" + exit 0 + else + echo -e "${RED}Some tests failed!${NC}" + echo + echo "Failed tests:" + for test_path in "${FAILED_TEST_NAMES[@]}"; do + echo -e " ${RED}✗${NC} $test_path" + done + exit 1 + fi +} + +# Main execution +main() { + echo "Awk-Scheme Test Runner" + echo "=====================" + echo "Project root: $PROJECT_ROOT" + echo "Scheme binary: $SCHEME_BIN" + echo "Debug mode: $DEBUG" + echo + + # Check if scheme binary exists + if [ ! -f "$SCHEME_BIN" ]; then + print_status "FAIL" "Scheme binary not found: $SCHEME_BIN" + exit 1 + fi + + # Make sure it's executable + chmod +x "$SCHEME_BIN" + + # Run tests in order of importance + run_test_directory "$PROJECT_ROOT/test/unit" + run_test_directory "$PROJECT_ROOT/test/integration" + run_test_directory "$PROJECT_ROOT/test/examples" + run_test_directory "$PROJECT_ROOT/test/regression" + + print_summary +} + +# Run main function +main "$@" \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/additional_lists.scm b/awk/scheme/scheme/test/unit/additional_lists.scm new file mode 100644 index 0000000..369f26d --- /dev/null +++ b/awk/scheme/scheme/test/unit/additional_lists.scm @@ -0,0 +1,24 @@ +;; Even more list utilities + +(list) +(list 1) +(list 1 2) +(list 1 2 3) +(list "a" "b" "c") + +(reverse nil) +(reverse (list 1)) +(reverse (list 1 2)) +(reverse (list 1 2 3)) +(reverse (list "a" "b" "c")) + +(member 1 nil) +(member 1 (list 1)) +(member 2 (list 1 2 3)) +(member 4 (list 1 2 3)) +(member "a" (list "a" "b" "c")) +(member "d" (list "a" "b" "c")) + +(reverse (list 1 2 3)) +(member 2 (reverse (list 1 2 3))) +(list (car (member 2 (list 1 2 3))) (cadr (member 2 (list 1 2 3)))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/advanced_control_flow.scm b/awk/scheme/scheme/test/unit/advanced_control_flow.scm new file mode 100644 index 0000000..0b37d4c --- /dev/null +++ b/awk/scheme/scheme/test/unit/advanced_control_flow.scm @@ -0,0 +1,46 @@ +;; Advanced Control Flow Tests +;; Tests for complex control flow patterns + +;; Nested if expressions +(if (if #t #f #t) "nested-true" "nested-false") +(if (if #f #t #f) "nested-true" "nested-false") + +;; Complex cond expressions +(cond ((< 5 3) "first") + ((> 10 5) "second") + ((= 2 2) "third") + (else "else")) + +(cond ((= 1 2) "first") + ((= 2 3) "second") + ((= 3 4) "third") + (else "else")) + +;; Multiple conditions in cond +(cond ((and #t #f) "and-false") + ((or #f #t) "or-true") + ((not #f) "not-true") + (else "else")) + +;; Short-circuit evaluation with side effects +(and #f (display "should-not-print")) +(or #t (display "should-not-print")) + +;; Complex boolean expressions +(and (or #f #t) (and #t #t)) +(or (and #f #f) (or #t #f)) +(not (and #t #f)) +(not (or #f #f)) + +;; Nested boolean expressions +(if (and (or #t #f) (and #t #t)) "complex-true" "complex-false") +(if (or (and #f #f) (or #t #f)) "complex-true" "complex-false") + +;; Conditional with function calls +(if (= (+ 2 3) 5) "math-true" "math-false") +(if (< (* 3 4) 10) "math-true" "math-false") + +;; Multiple nested conditions +(cond ((< 1 2) (if (> 3 2) "nested-true" "nested-false")) + ((> 5 3) "second") + (else "else")) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/advanced_functions.scm b/awk/scheme/scheme/test/unit/advanced_functions.scm new file mode 100644 index 0000000..fb707b9 --- /dev/null +++ b/awk/scheme/scheme/test/unit/advanced_functions.scm @@ -0,0 +1,68 @@ +;; Advanced Function Tests +;; Tests for complex function patterns + +;; Recursive functions +(define factorial (n) + (if (= n 0) + 1 + (* n (factorial (- n 1))))) + +(factorial 5) +(factorial 0) + +;; Nested function definitions +(define outer (x) + (define inner (y) + (+ x y)) + (inner 10)) + +(outer 5) + +;; Higher-order functions (functions that return functions) +(define make-adder (n) + (lambda (x) (+ n x))) + +(define add5 (make-adder 5)) +(add5 3) + +;; Function composition +(define compose (f g) + (lambda (x) (f (g x)))) + +(define double (x) (* x 2)) +(define square (x) (* x x)) +(define double-square (compose double square)) +(double-square 3) + +;; Variable argument functions (testing existing string-append) +(string-append "a" "b" "c" "d") +(string-append "hello" " " "world" "!") + +;; Functions with complex bodies +(define complex-func (x y) + (if (> x y) + (+ x (* y 2)) + (if (= x y) + (* x y) + (- y x)))) + +(complex-func 10 5) +(complex-func 5 5) +(complex-func 3 7) + +;; Functions that use multiple standard library functions +(define list-processor (lst) + (if (null? lst) + 0 + (+ (car lst) (list-processor (cdr lst))))) + +(list-processor (cons 1 (cons 2 (cons 3 nil)))) + +;; Functions with side effects +(define side-effect-func (x) + (display "processing: ") + (display x) + (display "\n") + (* x 2)) + +(side-effect-func 5) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/basic_higher_order_functions.scm b/awk/scheme/scheme/test/unit/basic_higher_order_functions.scm new file mode 100644 index 0000000..d16d104 --- /dev/null +++ b/awk/scheme/scheme/test/unit/basic_higher_order_functions.scm @@ -0,0 +1,24 @@ +(define nums (list 1 2 3 4 5)) +(define doubled (map abs nums)) +(assert (= (length doubled) 5)) +(assert (= (car doubled) 1)) +(assert (= (cadr doubled) 2)) + +(define mixed (list -1 2 -3 4 -5)) +(define positive-nums (filter positive? mixed)) +(assert (= (length positive-nums) 2)) +(assert (= (car positive-nums) 2)) +(assert (= (cadr positive-nums) 4)) + +(define absolute-positive (map abs (filter positive? mixed))) +(assert (= (length absolute-positive) 2)) +(assert (= (car absolute-positive) 2)) +(assert (= (cadr absolute-positive) 4)) + +(define filtered-then-mapped (map abs (filter negative? mixed))) +(assert (= (length filtered-then-mapped) 3)) +(assert (= (car filtered-then-mapped) 1)) +(assert (= (cadr filtered-then-mapped) 3)) +(assert (= (caddr filtered-then-mapped) 5)) + +(display "Map and filter comprehensive test passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/basic_map_operations.scm b/awk/scheme/scheme/test/unit/basic_map_operations.scm new file mode 100644 index 0000000..3dc244c --- /dev/null +++ b/awk/scheme/scheme/test/unit/basic_map_operations.scm @@ -0,0 +1,15 @@ +(define nums (list 1 2 3 4 5)) +(define doubled (map abs nums)) +(assert (= (length doubled) 5)) +(assert (= (car doubled) 1)) +(assert (= (cadr doubled) 2)) +(assert (= (caddr doubled) 3)) + +(define negative-nums (list -1 -2 -3)) +(define absolute-nums (map abs negative-nums)) +(assert (= (length absolute-nums) 3)) +(assert (= (car absolute-nums) 1)) +(assert (= (cadr absolute-nums) 2)) +(assert (= (caddr absolute-nums) 3)) + +(display "Map with built-in functions test passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/basic_numeric_operations.scm b/awk/scheme/scheme/test/unit/basic_numeric_operations.scm new file mode 100644 index 0000000..7c58ef1 --- /dev/null +++ b/awk/scheme/scheme/test/unit/basic_numeric_operations.scm @@ -0,0 +1,40 @@ +(assert (= (modulo 17 5) 2)) +(assert (= (modulo -17 5) 3)) +(assert (= (% 17 5) 2)) + +(assert (= (expt 2 3) 8)) +(assert (= (expt 5 0) 1)) + +(assert (= (abs 5) 5)) +(assert (= (abs -5) 5)) +(assert (= (abs 0) 0)) + +(assert (= (min 3 7) 3)) +(assert (= (max 3 7) 7)) + +(assert (number? 42)) +(assert (not (number? "hello"))) +(assert (string? "hello")) +(assert (not (string? 42))) +(assert (boolean? #t)) +(assert (not (boolean? 42))) + +(assert (zero? 0)) +(assert (not (zero? 1))) +(assert (positive? 1)) +(assert (not (positive? -1))) +(assert (negative? -1)) +(assert (not (negative? 1))) + +(define double (lambda (x) (* x 2))) +(define nums (list 1 2 3)) +(define doubled (map double nums)) +(assert (= (length doubled) 3)) +(assert (= (car doubled) 2)) + +(define even? (lambda (x) (= (modulo x 2) 0))) +(define evens (filter even? nums)) +(assert (= (length evens) 1)) +(assert (= (car evens) 2)) + +(display "All new function tests passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/basic_stdlib_test.scm b/awk/scheme/scheme/test/unit/basic_stdlib_test.scm new file mode 100644 index 0000000..da6bbbe --- /dev/null +++ b/awk/scheme/scheme/test/unit/basic_stdlib_test.scm @@ -0,0 +1,4 @@ +(null? nil) +(pair? (cons 1 2)) +(length (cons 1 (cons 2 nil))) +(cadr (cons 1 (cons 2 (cons 3 nil)))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/booleans.scm b/awk/scheme/scheme/test/unit/booleans.scm new file mode 100644 index 0000000..2443201 --- /dev/null +++ b/awk/scheme/scheme/test/unit/booleans.scm @@ -0,0 +1,6 @@ +(= 1 1) +(= 1 2) +(< 1 2) +(< 2 1) +(> 5 3) +(> 3 5) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/caddr_test.scm b/awk/scheme/scheme/test/unit/caddr_test.scm new file mode 100644 index 0000000..66e0988 --- /dev/null +++ b/awk/scheme/scheme/test/unit/caddr_test.scm @@ -0,0 +1 @@ +(caddr (cons 1 (cons 2 (cons 3 nil)))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/call_stack_infrastructure.scm b/awk/scheme/scheme/test/unit/call_stack_infrastructure.scm new file mode 100644 index 0000000..090ad43 --- /dev/null +++ b/awk/scheme/scheme/test/unit/call_stack_infrastructure.scm @@ -0,0 +1,6 @@ +(assert (= (abs 5) 5)) +(assert (= (modulo 17 5) 2)) +(assert (number? 42)) +(assert (positive? 10)) + +(display "Call stack infrastructure test passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/closures.scm b/awk/scheme/scheme/test/unit/closures.scm new file mode 100644 index 0000000..7f3c5ec --- /dev/null +++ b/awk/scheme/scheme/test/unit/closures.scm @@ -0,0 +1,3 @@ +(let ((x 10)) ((lambda (y) (+ x y)) 32)) +(let ((add1 (lambda (x) (+ x 1)))) (add1 41)) +(let ((x 5)) (let ((f (lambda (y) (+ x y)))) (f 10))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/comprehensive_higher_order_functions.scm b/awk/scheme/scheme/test/unit/comprehensive_higher_order_functions.scm new file mode 100644 index 0000000..d204775 --- /dev/null +++ b/awk/scheme/scheme/test/unit/comprehensive_higher_order_functions.scm @@ -0,0 +1,52 @@ +(define nums (list 1 2 3 4 5)) +(define absolute-nums (map abs (list -1 -2 -3))) +(assert (= (length absolute-nums) 3)) +(assert (= (car absolute-nums) 1)) +(assert (= (cadr absolute-nums) 2)) +(assert (= (caddr absolute-nums) 3)) + +(define mixed (list -1 2 -3 4 -5)) +(define positive-nums (filter positive? mixed)) +(assert (= (length positive-nums) 2)) +(assert (= (car positive-nums) 2)) +(assert (= (cadr positive-nums) 4)) + +(define double (lambda (x) (* x 2))) +(define doubled (map double nums)) +(assert (= (length doubled) 5)) +(assert (= (car doubled) 2)) +(assert (= (cadr doubled) 4)) +(assert (= (caddr doubled) 6)) + +(define even? (lambda (x) (= (modulo x 2) 0))) +(define evens (filter even? nums)) +(assert (= (length evens) 2)) +(assert (= (car evens) 2)) +(assert (= (cadr evens) 4)) + +(define add-one (lambda (x) (+ x 1))) +(define incremented (map add-one nums)) +(assert (= (length incremented) 5)) +(assert (= (car incremented) 2)) +(assert (= (cadr incremented) 3)) + +(define greater-than-3 (lambda (x) (> x 3))) +(define large-nums (filter greater-than-3 nums)) +(assert (= (length large-nums) 2)) +(assert (= (car large-nums) 4)) +(assert (= (cadr large-nums) 5)) + +(define complex-result (map double (filter even? nums))) +(assert (= (length complex-result) 2)) +(assert (= (car complex-result) 4)) +(assert (= (cadr complex-result) 8)) + +(define add-one (lambda (x) (+ x 1))) +(define double (lambda (x) (* x 2))) +(define triple (lambda (x) (* x 3))) +(define processed (map triple (map double (map add-one nums)))) +(assert (= (length processed) 5)) +(assert (= (car processed) 12)) +(assert (= (cadr processed) 18)) + +(display "All map and filter tests passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/comprehensive_numeric_and_type_operations.scm b/awk/scheme/scheme/test/unit/comprehensive_numeric_and_type_operations.scm new file mode 100644 index 0000000..2c07a85 --- /dev/null +++ b/awk/scheme/scheme/test/unit/comprehensive_numeric_and_type_operations.scm @@ -0,0 +1,174 @@ +;; Test file for new functions: numeric operations, type predicates, and list operations + +;; Test essential numeric operations +(assert (= (modulo 17 5) 2)) +(assert (= (modulo -17 5) 3)) +(assert (= (modulo 17 -5) 2)) +(assert (= (% 17 5) 2)) ; Test alias + +(assert (= (expt 2 3) 8)) +(assert (= (expt 5 0) 1)) +(assert (= (expt 2 10) 1024)) + +(assert (= (abs 5) 5)) +(assert (= (abs -5) 5)) +(assert (= (abs 0) 0)) + +(assert (= (min 3 7) 3)) +(assert (= (min 7 3) 3)) +(assert (= (min -5 10) -5)) +(assert (= (min 5 5) 5)) + +(assert (= (max 3 7) 7)) +(assert (= (max 7 3) 7)) +(assert (= (max -5 10) 10)) +(assert (= (max 5 5) 5)) + +;; Test type predicates +(assert (number? 42)) +(assert (number? -17)) +(assert (not (number? "hello"))) +(assert (not (number? #t))) +(assert (not (number? nil))) + +(assert (string? "hello")) +(assert (string? "")) +(assert (not (string? 42))) +(assert (not (string? #t))) +(assert (not (string? nil))) + +(assert (boolean? #t)) +(assert (boolean? #f)) +(assert (not (boolean? 42))) +(assert (not (boolean? "hello"))) +(assert (not (boolean? nil))) + +;; Note: symbol? test commented out until quote is implemented +;; (assert (symbol? 'hello)) +;; (assert (not (symbol? 42))) +;; (assert (not (symbol? "hello"))) +;; (assert (not (symbol? #t))) +;; (assert (not (symbol? nil))) + +;; Test numeric predicates +(assert (zero? 0)) +(assert (not (zero? 1))) +(assert (not (zero? -1))) + +(assert (positive? 1)) +(assert (positive? 42)) +(assert (not (positive? 0))) +(assert (not (positive? -1))) + +(assert (negative? -1)) +(assert (negative? -42)) +(assert (not (negative? 0))) +(assert (not (negative? 1))) + +;; Test list operations: map +(define double (lambda (x) (* x 2))) +(define square (lambda (x) (* x x))) +(define even? (lambda (x) (= (modulo x 2) 0))) + +(define nums (list 1 2 3 4 5)) +(define doubled (map double nums)) +(assert (= (length doubled) 5)) +(assert (= (car doubled) 2)) +(assert (= (cadr doubled) 4)) +(assert (= (caddr doubled) 6)) + +(define squared (map square nums)) +(assert (= (car squared) 1)) +(assert (= (cadr squared) 4)) +(assert (= (caddr squared) 9)) + +;; Test list operations: filter +(define evens (filter even? nums)) +(assert (= (length evens) 2)) +(assert (= (car evens) 2)) +(assert (= (cadr evens) 4)) + +(define odds (filter (lambda (x) (not (even? x))) nums)) +(assert (= (length odds) 3)) +(assert (= (car odds) 1)) +(assert (= (cadr odds) 3)) +(assert (= (caddr odds) 5)) + +;; Test empty list handling +(assert (null? (map double nil))) +(assert (null? (filter even? nil))) + +;; Test complex combinations +(define mixed (list 1 "hello" 3 #t 5)) +(define numbers-only (filter number? mixed)) +(assert (= (length numbers-only) 3)) +(assert (= (car numbers-only) 1)) +(assert (= (cadr numbers-only) 3)) +(assert (= (caddr numbers-only) 5)) + +(define strings-only (filter string? mixed)) +(assert (= (length strings-only) 1)) +(assert (string=? (car strings-only) "hello")) + +;; Test nested operations +(define nested (list (list 1 2) (list 3 4) (list 5 6))) +(define lengths (map length nested)) +(assert (= (car lengths) 2)) +(assert (= (cadr lengths) 2)) +(assert (= (caddr lengths) 2)) + +;; Test with higher-order functions +(define add1 (lambda (x) (+ x 1))) +(define range (list 1 2 3 4 5)) +(define incremented (map add1 range)) +(assert (= (car incremented) 2)) +(assert (= (cadr incremented) 3)) +(assert (= (caddr incremented) 4)) + +;; Test edge cases +(assert (= (abs -0) 0)) +(assert (= (min 1 1) 1)) +(assert (= (max 1 1) 1)) +(assert (= (modulo 0 5) 0)) + +;; Test type checking with predicates +(define test-list (list 1 "hello" #t nil)) +(assert (number? (car test-list))) +(assert (string? (cadr test-list))) +(assert (boolean? (caddr test-list))) +(assert (null? (caddr test-list))) + +;; Test complex numeric operations +(assert (= (expt (abs -2) 3) 8)) +(assert (= (modulo (expt 3 2) 4) 1)) +(assert (= (min (abs -5) (abs 3)) 3)) +(assert (= (max (abs -2) (abs 7)) 7)) + +;; Test list operations with complex predicates +(define complex-nums (list -5 0 3 -2 7)) +(define positive-nums (filter positive? complex-nums)) +(assert (= (length positive-nums) 2)) +(assert (= (car positive-nums) 3)) +(assert (= (cadr positive-nums) 7)) + +(define negative-nums (filter negative? complex-nums)) +(assert (= (length negative-nums) 2)) +(assert (= (car negative-nums) -5)) +(assert (= (cadr negative-nums) -2)) + +;; Test map with different return types +(define type-names (map (lambda (x) + (cond + ((number? x) "number") + ((string? x) "string") + ((boolean? x) "boolean") + ((null? x) "nil") + (else "unknown"))) + test-list)) +(assert (string=? (car type-names) "number")) +(assert (string=? (cadr type-names) "string")) +(assert (string=? (caddr type-names) "boolean")) +(assert (string=? (caddr type-names) "nil")) + +;; All tests passed! +(display "All new function tests passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/control_flow.scm b/awk/scheme/scheme/test/unit/control_flow.scm new file mode 100644 index 0000000..0a99b9a --- /dev/null +++ b/awk/scheme/scheme/test/unit/control_flow.scm @@ -0,0 +1,56 @@ +(if #t "true" "false") +(if #f "true" "false") +(if 0 "true" "false") +(if 42 "true" "false") +(if nil "true" "false") +(if (cons 1 2) "true" "false") + +(cond (#t "first") + (#f "second") + (else "else")) + +(cond (#f "first") + (#t "second") + (else "else")) + +(cond (#f "first") + (#f "second") + (else "else")) + +(and) +(and #t) +(and #f) +(and #t #t) +(and #t #f) +(and #f #t) +(and #f #f) +(and #t #t #t) +(and #t #f #t) + +(or) +(or #t) +(or #f) +(or #t #t) +(or #t #f) +(or #f #t) +(or #f #f) +(or #f #f #t) +(or #f #f #f) + +(not #t) +(not #f) +(not 0) +(not 42) +(not nil) +(not (cons 1 2)) + +(if (and #t #f) "and-true" "and-false") +(if (or #f #t) "or-true" "or-false") +(if (not #t) "not-true" "not-false") + +(cond ((and #t #t) "and-true") + ((or #f #f) "or-false") + (else "else")) + +(and #f (error "should not evaluate")) +(or #t (error "should not evaluate")) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/environment_scoping.scm b/awk/scheme/scheme/test/unit/environment_scoping.scm new file mode 100644 index 0000000..3d208cf --- /dev/null +++ b/awk/scheme/scheme/test/unit/environment_scoping.scm @@ -0,0 +1,66 @@ +;; Environment and Scoping Tests +;; Tests for lexical scoping, variable shadowing, and environment management + +;; Variable shadowing +(define x 10) +(let ((x 20)) + x) +x + +;; Nested let expressions +(let ((x 1)) + (let ((y 2)) + (let ((z 3)) + (+ x (+ y z))))) + +;; Shadowing in nested environments +(define global 100) +(let ((global 200)) + (let ((global 300)) + global) + global) +global + +;; Function parameter shadowing +(define shadow-test (x) + (let ((x (+ x 10))) + x)) + +(shadow-test 5) + +;; Complex nested scoping +(let ((a 1)) + (let ((b (+ a 1))) + (let ((c (+ b 1))) + (let ((a (+ c 1))) ;; shadows outer a + (+ a b c))))) + +;; Environment persistence in closures +(define make-counter () + (let ((count 0)) + (lambda () + (let ((old-count count)) + (let ((count (+ count 1))) ;; shadows outer count + old-count))))) + +(define counter (make-counter)) +(counter) +(counter) +(counter) + +;; Multiple closures sharing environment +(define make-pair (x y) + (let ((get-x (lambda () x)) + (get-y (lambda () y)) + (set-x (lambda (new-x) (let ((x new-x)) x)))) + (list get-x get-y set-x))) + +;; Complex environment nesting with functions +(define outer-env (x) + (define inner-env (y) + (define deepest-env (z) + (+ x (+ y z))) + (deepest-env 1)) + (inner-env 2)) + +(outer-env 3) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/error_handling.scm b/awk/scheme/scheme/test/unit/error_handling.scm new file mode 100644 index 0000000..28f105d --- /dev/null +++ b/awk/scheme/scheme/test/unit/error_handling.scm @@ -0,0 +1,42 @@ +;; Error Handling and Edge Cases Tests +;; Tests for proper error handling and edge cases + +;; Division by zero +(/ 10 0) + +;; Stack underflow scenarios +;; These should error when stack is empty +(car nil) +(cdr nil) +(car (cons 1 nil)) +(cdr (cons 1 nil)) + +;; Type errors +(car 42) +(cdr "hello") +(cons 1 2 3) ;; cons should take exactly 2 arguments + +;; Invalid function calls +(+ 1) ;; + needs at least 2 arguments +(* 5) ;; * needs at least 2 arguments +(string-ref "hello") ;; string-ref needs 2 arguments +(string-ref "hello" 10) ;; index out of bounds + +;; List operations on non-lists +(length 42) +(length "hello") +(null? 42) +(pair? "hello") + +;; Standard library edge cases +(list-ref nil 0) +(list-ref (cons 1 nil) 5) +(list-tail nil 1) +(cadr nil) +(caddr (cons 1 (cons 2 nil))) + +;; Invalid string operations +(string-ref "" 0) +(string-ref "a" 1) +(substring "hello" 3 1) ;; start > end +(substring "hello" 10 15) ;; out of bounds \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/essential_numeric_operations.scm b/awk/scheme/scheme/test/unit/essential_numeric_operations.scm new file mode 100644 index 0000000..0f0105b --- /dev/null +++ b/awk/scheme/scheme/test/unit/essential_numeric_operations.scm @@ -0,0 +1,29 @@ +(assert (= (modulo 17 5) 2)) +(assert (= (modulo -17 5) 3)) +(assert (= (% 17 5) 2)) + +(assert (= (expt 2 3) 8)) +(assert (= (expt 5 0) 1)) + +(assert (= (abs 5) 5)) +(assert (= (abs -5) 5)) +(assert (= (abs 0) 0)) + +(assert (= (min 3 7) 3)) +(assert (= (max 3 7) 7)) + +(assert (number? 42)) +(assert (not (number? "hello"))) +(assert (string? "hello")) +(assert (not (string? 42))) +(assert (boolean? #t)) +(assert (not (boolean? 42))) + +(assert (zero? 0)) +(assert (not (zero? 1))) +(assert (positive? 1)) +(assert (not (positive? -1))) +(assert (negative? -1)) +(assert (not (negative? 1))) + +(display "All basic new function tests passed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/functions.scm b/awk/scheme/scheme/test/unit/functions.scm new file mode 100644 index 0000000..43b3a71 --- /dev/null +++ b/awk/scheme/scheme/test/unit/functions.scm @@ -0,0 +1,8 @@ +(define add (x y) (+ x y)) +(add 2 3) +(define double (x) (* x 2)) +(double 5) +(define square (x) (* x x)) +(square 4) +(define add-three (x y z) (+ x (+ y z))) +(add-three 1 2 3) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/inc_alias_test.scm b/awk/scheme/scheme/test/unit/inc_alias_test.scm new file mode 100644 index 0000000..cf7ae94 --- /dev/null +++ b/awk/scheme/scheme/test/unit/inc_alias_test.scm @@ -0,0 +1,13 @@ +(assert (= (inc 41) 42)) +(assert (= (inc 0) 1)) +(assert (= (inc -1) 0)) + +;; Test ++ alias +(assert (= (++ 41) 42)) +(assert (= (++ 0) 1)) +(assert (= (++ -1) 0)) + +;; Test that both inc and ++ are the same function +(assert (= (inc 10) (++ 10))) +(assert (= (inc 999) 1000)) +(assert (= (++ 999) 1000)) diff --git a/awk/scheme/scheme/test/unit/lambdas.scm b/awk/scheme/scheme/test/unit/lambdas.scm new file mode 100644 index 0000000..a287b47 --- /dev/null +++ b/awk/scheme/scheme/test/unit/lambdas.scm @@ -0,0 +1,4 @@ +((lambda (x) (+ x 1)) 41) +((lambda (x y) (+ x y)) 20 22) +(define add1 (lambda (x) (+ x 1))) +(add1 41) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/let.scm b/awk/scheme/scheme/test/unit/let.scm new file mode 100644 index 0000000..f3fa367 --- /dev/null +++ b/awk/scheme/scheme/test/unit/let.scm @@ -0,0 +1,3 @@ +(let ((x 10) (y 20)) (+ x y)) +(let ((x 5)) (* x x)) +(let ((x 10)) (let ((y 20)) (+ x y))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/lists.scm b/awk/scheme/scheme/test/unit/lists.scm new file mode 100644 index 0000000..bc7b7f4 --- /dev/null +++ b/awk/scheme/scheme/test/unit/lists.scm @@ -0,0 +1,7 @@ +(cons 1 2) +(car (cons 1 2)) +(cdr (cons 1 2)) +(cons 1 (cons 2 (cons 3 nil))) +(car (cons 1 (cons 2 (cons 3 nil)))) +(car (cdr (cons 1 (cons 2 (cons 3 nil))))) +(car (cdr (cdr (cons 1 (cons 2 (cons 3 nil)))))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/numbers.scm b/awk/scheme/scheme/test/unit/numbers.scm new file mode 100644 index 0000000..b4f255c --- /dev/null +++ b/awk/scheme/scheme/test/unit/numbers.scm @@ -0,0 +1,9 @@ +(+ 1 2) +(+ 10 20 30) +(- 10 5) +(- 5) +(* 3 4) +(* 2 3 4) +(/ 10 2) +(/ 20 4 2) +(inc 41) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/performance_stress.scm b/awk/scheme/scheme/test/unit/performance_stress.scm new file mode 100644 index 0000000..06b7f65 --- /dev/null +++ b/awk/scheme/scheme/test/unit/performance_stress.scm @@ -0,0 +1,62 @@ +;; Performance and Stress Tests +;; Tests for handling large data structures and deep recursion + +;; Large list construction +(define make-large-list (n) + (if (= n 0) + nil + (cons n (make-large-list (- n 1))))) + +;; Test with moderately large list (not too large to avoid timeout) +(make-large-list 10) + +;; Deep recursion test +(define deep-recursion (n) + (if (= n 0) + 0 + (+ 1 (deep-recursion (- n 1))))) + +(deep-recursion 10) + +;; Complex nested expressions +(+ (+ (+ 1 2) (+ 3 4)) (+ (+ 5 6) (+ 7 8))) + +;; Multiple function calls in sequence +(define test-sequence (n) + (if (= n 0) + 0 + (+ n (test-sequence (- n 1))))) + +(test-sequence 5) + +;; String operations on larger strings +(string-append "very" " " "long" " " "string" " " "with" " " "many" " " "parts") + +;; List operations on nested lists +(define nested-list (cons 1 (cons (cons 2 (cons 3 nil)) (cons 4 nil)))) +(car (car (cdr nested-list))) + +;; Multiple arithmetic operations +(+ (* (+ 1 2) (+ 3 4)) (- (* 5 6) (+ 7 8))) + +;; Complex boolean expressions +(and (or #t #f) (and #t (or #f #t)) (not #f)) + +;; Nested function calls +(define outer (x) + (define middle (y) + (define inner (z) + (+ x (+ y z))) + (inner 1)) + (middle 2)) + +(outer 3) + +;; Memory stress with multiple large operations +(define stress-test (n) + (if (= n 0) + 0 + (+ (length (make-large-list n)) + (stress-test (- n 1))))) + +(stress-test 3) ;; Keep small to avoid timeout \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_additional_lists.scm b/awk/scheme/scheme/test/unit/simple_additional_lists.scm new file mode 100644 index 0000000..e75c2bb --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_additional_lists.scm @@ -0,0 +1,9 @@ +(list) +(list 1) +(list 1 2) + +(reverse nil) +(reverse (list 1)) + +(member 1 nil) +(member 1 (list 1)) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_cond.scm b/awk/scheme/scheme/test/unit/simple_cond.scm new file mode 100644 index 0000000..03dd9cd --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_cond.scm @@ -0,0 +1,4 @@ +;; Simple Cond Test + +;; Test cond with simple condition +(cond (#t "true") (else "false")) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_control_flow.scm b/awk/scheme/scheme/test/unit/simple_control_flow.scm new file mode 100644 index 0000000..41df610 --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_control_flow.scm @@ -0,0 +1,19 @@ +;; Simple Control Flow Tests + +;; Test if expressions +(if #t "true" "false") +(if #f "true" "false") + +;; Test and expressions +(and #t #t) +(and #t #f) +(and #f #t) + +;; Test or expressions +(or #t #t) +(or #t #f) +(or #f #t) + +;; Test not expressions +(not #t) +(not #f) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_member.scm b/awk/scheme/scheme/test/unit/simple_member.scm new file mode 100644 index 0000000..b2aa9d6 --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_member.scm @@ -0,0 +1,5 @@ +;; Simple member test + +;; Test with pre-built list +(define mylist (list 1)) +(member 1 mylist) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_stdlib_test.scm b/awk/scheme/scheme/test/unit/simple_stdlib_test.scm new file mode 100644 index 0000000..0bc7866 --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_stdlib_test.scm @@ -0,0 +1,13 @@ +;; Simple test for standard library functions + +;; Test null? with nil +(null? nil) + +;; Test null? with a pair +(null? (cons 1 2)) + +;; Test pair? with a pair +(pair? (cons 1 2)) + +;; Test pair? with nil +(pair? nil) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/simple_string_append_test.scm b/awk/scheme/scheme/test/unit/simple_string_append_test.scm new file mode 100644 index 0000000..52d9719 --- /dev/null +++ b/awk/scheme/scheme/test/unit/simple_string_append_test.scm @@ -0,0 +1,11 @@ +;; Simple test for string-append with multiple arguments + +(display "Testing string-append with 3 arguments: ") +(display (string-append "a" "b" "c")) +(display "\n") + +(display "Testing string-append with 4 arguments: ") +(display (string-append "Hello" " " "World" "!")) +(display "\n") + +(display "Test completed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/single_list.scm b/awk/scheme/scheme/test/unit/single_list.scm new file mode 100644 index 0000000..325c710 --- /dev/null +++ b/awk/scheme/scheme/test/unit/single_list.scm @@ -0,0 +1,4 @@ +;; Single list test + +;; Test with one element +(list 1) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/stdlib_lists.scm b/awk/scheme/scheme/test/unit/stdlib_lists.scm new file mode 100644 index 0000000..9a51809 --- /dev/null +++ b/awk/scheme/scheme/test/unit/stdlib_lists.scm @@ -0,0 +1,55 @@ +;; Standard Library List Functions Tests +;; Phase 1: Core list utilities + +;; Test null? function +(null? nil) +(null? (cons 1 2)) +(null? (cons 1 nil)) + +;; Test pair? function +(pair? (cons 1 2)) +(pair? nil) +(pair? 42) +(pair? "hello") + +;; Test length function +(length nil) +(length (cons 1 nil)) +(length (cons 1 (cons 2 nil))) +(length (cons 1 (cons 2 (cons 3 nil)))) + +;; Test list construction using cons +(length (cons 1 (cons 2 (cons 3 nil)))) + +;; Test cadr function +(cadr (cons 1 (cons 2 (cons 3 nil)))) +(cadr (cons "a" (cons "b" nil))) + +;; Test caddr function +(caddr (cons 1 (cons 2 (cons 3 nil)))) +(caddr (cons "x" (cons "y" (cons "z" nil)))) + +;; Test list-ref function +(list-ref (cons 1 (cons 2 (cons 3 nil))) 0) +(list-ref (cons 1 (cons 2 (cons 3 nil))) 1) +(list-ref (cons 1 (cons 2 (cons 3 nil))) 2) + +;; Test list-tail function +(list-tail (cons 1 (cons 2 (cons 3 nil))) 0) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 1) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 2) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 3) + +;; Test append function +(append nil (cons 1 (cons 2 nil))) +(append (cons 1 (cons 2 nil)) nil) +(append (cons 1 (cons 2 nil)) (cons 3 (cons 4 nil))) + +;; Test edge cases and error conditions +;; These should produce errors: +;; (cadr nil) +;; (cadr (cons 1 nil)) +;; (caddr (cons 1 (cons 2 nil))) +;; (list-ref nil 0) +;; (list-ref (cons 1 nil) 5) +;; (list-tail nil 1) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/stdlib_results_test.scm b/awk/scheme/scheme/test/unit/stdlib_results_test.scm new file mode 100644 index 0000000..3a36ce5 --- /dev/null +++ b/awk/scheme/scheme/test/unit/stdlib_results_test.scm @@ -0,0 +1,42 @@ +;; Test standard library functions + +;; Test null? function +(null? nil) +(null? (cons 1 2)) +(null? (cons 1 nil)) + +;; Test pair? function +(pair? (cons 1 2)) +(pair? nil) +(pair? 42) +(pair? "hello") + +;; Test length function +(length nil) +(length (cons 1 nil)) +(length (cons 1 (cons 2 nil))) +(length (cons 1 (cons 2 (cons 3 nil)))) + +;; Test cadr function +(cadr (cons 1 (cons 2 (cons 3 nil)))) +(cadr (cons "a" (cons "b" nil))) + +;; Test caddr function +(caddr (cons 1 (cons 2 (cons 3 nil)))) +(caddr (cons "x" (cons "y" (cons "z" nil)))) + +;; Test list-ref function +(list-ref (cons 1 (cons 2 (cons 3 nil))) 0) +(list-ref (cons 1 (cons 2 (cons 3 nil))) 1) +(list-ref (cons 1 (cons 2 (cons 3 nil))) 2) + +;; Test list-tail function +(list-tail (cons 1 (cons 2 (cons 3 nil))) 0) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 1) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 2) +(list-tail (cons 1 (cons 2 (cons 3 nil))) 3) + +;; Test append function +(append nil (cons 1 (cons 2 nil))) +(append (cons 1 (cons 2 nil)) nil) +(append (cons 1 (cons 2 nil)) (cons 3 (cons 4 nil))) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/string_append_variable_args.scm b/awk/scheme/scheme/test/unit/string_append_variable_args.scm new file mode 100644 index 0000000..ad53fb6 --- /dev/null +++ b/awk/scheme/scheme/test/unit/string_append_variable_args.scm @@ -0,0 +1,54 @@ +;; Test string-append with variable arguments +;; This test verifies that string-append can handle 2 or more arguments + +;; Test with 2 arguments (original functionality) +(display "Testing 2 arguments: ") +(display (string-append "hello" " world")) +(display "\n") + +;; Test with 3 arguments +(display "Testing 3 arguments: ") +(display (string-append "a" "b" "c")) +(display "\n") + +;; Test with 4 arguments +(display "Testing 4 arguments: ") +(display (string-append "Hello" " " "World" "!")) +(display "\n") + +;; Test with 5 arguments +(display "Testing 5 arguments: ") +(display (string-append "The" " " "quick" " " "fox")) +(display "\n") + +;; Test with empty strings +(display "Testing empty strings: ") +(display (string-append "" "" "")) +(display "\n") + +;; Test with mixed content +(display "Testing mixed content: ") +(display (string-append "Number: " "42" " is " "even")) +(display "\n") + +;; Test with single character strings +(display "Testing single chars: ") +(display (string-append "a" "b" "c" "d" "e")) +(display "\n") + +;; Test with newlines and spaces +(display "Testing with newlines: ") +(display (string-append "Line 1" "\n" "Line 2" "\n" "Line 3")) +(display "\n") + +;; Test with many arguments +(display "Testing many arguments: ") +(display (string-append "1" "2" "3" "4" "5" "6" "7" "8" "9" "0")) +(display "\n") + +;; Test that it still works with 2 arguments (regression test) +(display "Regression test (2 args): ") +(display (string-append "test" "ing")) +(display "\n") + +(display "All string-append variable argument tests completed!\n") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/strings.scm b/awk/scheme/scheme/test/unit/strings.scm new file mode 100644 index 0000000..a198cd2 --- /dev/null +++ b/awk/scheme/scheme/test/unit/strings.scm @@ -0,0 +1,16 @@ +; Unit tests for string operations +; Test all string built-in functions + +(string-length "hello") +(string-length "") +(string-append "hello" " world") +(string-ref "hello" 0) +(string-ref "hello" 4) +(substring "hello world" 0 4) +(substring "hello world" 6 10) +(string=? "hello" "hello") +(string=? "hello" "world") +(string<? "abc" "def") +(string<? "def" "abc") +(string>? "xyz" "abc") +(string>? "abc" "xyz") \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/variables.scm b/awk/scheme/scheme/test/unit/variables.scm new file mode 100644 index 0000000..c153148 --- /dev/null +++ b/awk/scheme/scheme/test/unit/variables.scm @@ -0,0 +1,8 @@ +; Unit tests for variable operations +; Test define and variable lookup + +(define x 42) +(define y 10) +x +y +(+ x y) \ No newline at end of file diff --git a/awk/scheme/scheme/test/unit/vm_instructions.scm b/awk/scheme/scheme/test/unit/vm_instructions.scm new file mode 100644 index 0000000..c17ff52 --- /dev/null +++ b/awk/scheme/scheme/test/unit/vm_instructions.scm @@ -0,0 +1,77 @@ +;; VM Instructions Coverage Tests +;; Tests for VM instructions that might not be well covered + +;; Test DUP instruction (duplicate top of stack) +(define test-dup (x) + (+ x x)) ;; This should use DUP internally + +(test-dup 5) + +;; Test SWAP instruction (swap top two stack elements) +(define test-swap (x y) + (- y x)) ;; This might use SWAP internally + +(test-swap 10 5) + +;; Test environment operations (STORE, LOOKUP, POP_ENV) +(define test-env (x) + (let ((y (+ x 1))) + (+ x y))) + +(test-env 5) + +;; Test function calls (CALL, CALL_WITH_ARGS, RETURN) +(define helper (x) (+ x 1)) +(define caller (y) (helper y)) + +(caller 10) + +;; Test closures (CAPTURE_ENV, GET_VALUE) +(define make-adder (n) + (lambda (x) (+ n x))) + +(define add5 (make-adder 5)) +(add5 3) + +;; Test control flow instructions (JUMP, JUMP_IF_FALSE, JUMP_IF_TRUE) +(define test-control (x) + (if (> x 5) + (+ x 10) + (- x 10))) + +(test-control 7) +(test-control 3) + +;; Test complex nested control flow +(define complex-control (x y) + (cond ((> x y) (+ x y)) + ((= x y) (* x y)) + (else (- y x)))) + +(complex-control 10 5) +(complex-control 5 5) +(complex-control 3 7) + +;; Test stack manipulation +(define test-stack (a b c) + (+ a (+ b c))) ;; This should exercise stack operations + +(test-stack 1 2 3) + +;; Test multiple function calls in sequence +(define f1 (x) (+ x 1)) +(define f2 (x) (* x 2)) +(define f3 (x) (- x 1)) + +(f3 (f2 (f1 5))) + +;; Test variable argument functions (exercises CALL_WITH_ARGS) +(string-append "a" "b" "c" "d" "e") + +;; Test nested function definitions (exercises environment management) +(define outer (x) + (define inner (y) + (+ x y)) + (inner 10)) + +(outer 5) \ No newline at end of file diff --git a/bash/acmetodo b/bash/acme-stuff/acmetodo index 0c0b72f..0c0b72f 100755 --- a/bash/acmetodo +++ b/bash/acme-stuff/acmetodo diff --git a/bash/acmetodo-add b/bash/acme-stuff/acmetodo-add index b40663d..b40663d 100755 --- a/bash/acmetodo-add +++ b/bash/acme-stuff/acmetodo-add diff --git a/bash/acmetodo-all b/bash/acme-stuff/acmetodo-all index c00bb9b..c00bb9b 100755 --- a/bash/acmetodo-all +++ b/bash/acme-stuff/acmetodo-all diff --git a/bash/acmetodo-done b/bash/acme-stuff/acmetodo-done index 4829331..4829331 100755 --- a/bash/acmetodo-done +++ b/bash/acme-stuff/acmetodo-done diff --git a/bash/acmetodo-filter b/bash/acme-stuff/acmetodo-filter index 6149207..6149207 100644 --- a/bash/acmetodo-filter +++ b/bash/acme-stuff/acmetodo-filter diff --git a/bash/acmetodo-inprogress b/bash/acme-stuff/acmetodo-inprogress index d5ea505..d5ea505 100755 --- a/bash/acmetodo-inprogress +++ b/bash/acme-stuff/acmetodo-inprogress diff --git a/bash/acmetodo-todo b/bash/acme-stuff/acmetodo-todo index 6149207..6149207 100755 --- a/bash/acmetodo-todo +++ b/bash/acme-stuff/acmetodo-todo diff --git a/bash/acmetodo-toggle b/bash/acme-stuff/acmetodo-toggle index bffccec..bffccec 100755 --- a/bash/acmetodo-toggle +++ b/bash/acme-stuff/acmetodo-toggle diff --git a/bash/computer b/bash/computer new file mode 100755 index 0000000..e5aa36d --- /dev/null +++ b/bash/computer @@ -0,0 +1,295 @@ +#!/bin/bash + +# Get the directory where this script is located +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Computer Dispatch System +# This script intelligently routes prompts to the most appropriate thinking mechanism +# or directly to Ollama based on complexity, question type, and user intent. +# +# APPLICATION LOGIC: +# The computer dispatch system implements an intelligent routing mechanism that +# analyzes user prompts and determines the optimal response strategy. The system +# operates through three distinct phases designed to maximize response quality: +# +# PHASE 1 - PROMPT ANALYSIS: +# - Analyzes prompt complexity, length, and question type +# - Identifies user intent and specific keywords +# - Determines if direct Ollama response is appropriate +# - Classifies prompts into response categories +# +# PHASE 2 - MECHANISM SELECTION: +# - Routes to appropriate thinking mechanism based on classification +# - Uses decision tree with keywords for clear cases +# - Considers prompt complexity and user intent +# - Falls back to direct Ollama for simple cases +# +# PHASE 3 - RESPONSE EXECUTION: +# - Executes the selected mechanism or direct Ollama call +# - Maintains transparency about the routing decision +# - Provides consistent output format regardless of mechanism +# - Logs the decision process for analysis +# +# DISPATCH MODELING: +# The system applies intelligent routing principles to AI response generation: +# - Prompt classification helps match complexity to appropriate mechanism +# - Keyword analysis identifies specific user needs and intent +# - Decision tree provides consistent, predictable routing logic +# - Direct Ollama routing handles simple cases efficiently +# - Transparency shows users how their prompt was processed +# - The system may improve response quality by using specialized mechanisms +# +# The dispatch process emphasizes efficiency and appropriateness, +# ensuring users get the best possible response for their specific needs. +# The system balances speed with depth based on prompt characteristics. + +# --- Model Configuration --- +DEFAULT_MODEL="gemma3n:e2b" + +# --- Defaults --- +DEFAULT_ROUNDS=2 + +# --- Argument Validation --- +if [ "$#" -lt 1 ]; then + echo -e "\n\tComputer" + echo -e "\tThis script intelligently routes prompts to the most appropriate thinking mechanism" + echo -e "\tor directly to Ollama based on complexity, question type, and user intent." + echo -e "\n\tUsage: $0 [-f <file_path>] [-d] \"<your prompt>\" [number_of_rounds]" + echo -e "\n\tExample: $0 -f ./input.txt \"Please analyze this text\" 2" + echo -e "\n\tIf number_of_rounds is not provided, the program will default to $DEFAULT_ROUNDS rounds." + echo -e "\n\t-f <file_path> (optional): Append the contents of the file to the prompt." + echo -e "\n\t-d (optional): Force direct Ollama response (bypass thinking mechanisms)." + echo -e "\n" + exit 1 +fi + +# --- Argument Parsing --- +FILE_PATH="" +FORCE_DIRECT=false +while getopts "f:d" opt; do + case $opt in + f) + FILE_PATH="$OPTARG" + ;; + d) + FORCE_DIRECT=true + ;; + *) + echo "Invalid option: -$OPTARG" >&2 + exit 1 + ;; + esac +done +shift $((OPTIND -1)) + +PROMPT="$1" +if [ -z "$2" ]; then + ROUNDS=$DEFAULT_ROUNDS +else + ROUNDS=$2 +fi + +# If file path is provided, append its contents to the prompt +if [ -n "$FILE_PATH" ]; then + if [ ! -f "$FILE_PATH" ]; then + echo "File not found: $FILE_PATH" >&2 + exit 1 + fi + FILE_CONTENTS=$(cat "$FILE_PATH") + PROMPT="$PROMPT\n[FILE CONTENTS]\n$FILE_CONTENTS\n[END FILE]" +fi + +# Source the logging system using absolute path +source "${SCRIPT_DIR}/logging.sh" + +# --- File Initialization --- +# Create a temporary directory if it doesn't exist +mkdir -p ~/tmp +# Create a unique file for this session based on the timestamp +SESSION_FILE=~/tmp/computer_$(date +%Y%m%d_%H%M%S).txt + +# Initialize timing +SESSION_ID=$(generate_session_id) +start_timer "$SESSION_ID" "computer" + +echo "Computer Dispatch Session Log: ${SESSION_FILE}" +echo "---------------------------------" + +# Store the initial user prompt in the session file +echo "USER PROMPT: ${PROMPT}" >> "${SESSION_FILE}" +echo "FORCE DIRECT: ${FORCE_DIRECT}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +# --- Prompt Analysis Function --- +analyze_prompt() { + local prompt="$1" + local analysis="" + + # Check for direct Ollama requests + if [[ "$prompt" =~ (direct|simple|quick|fast|straight) ]]; then + analysis="DIRECT" + return + fi + + # Check prompt length (simple heuristic for complexity) + local word_count=$(echo "$prompt" | wc -w) + + # Very short prompts (likely simple questions) + if [ "$word_count" -le 5 ]; then + analysis="DIRECT" + return + fi + + # Keyword-based classification + if [[ "$prompt" =~ (consensus|agree|disagree|vote|multiple|perspectives|opinions) ]]; then + analysis="CONSENSUS" + elif [[ "$prompt" =~ (synthesize|combine|integrate|unify|merge|consolidate) ]]; then + analysis="SYNTHESIS" + elif [[ "$prompt" =~ (explore|paths|alternatives|options|compare|strategies|approaches) ]]; then + analysis="EXPLORATION" + elif [[ "$prompt" =~ (analyze|examine|explore|investigate|deep|thorough|comprehensive) ]]; then + analysis="SOCRATIC" + elif [[ "$prompt" =~ (improve|refine|edit|revise|better|enhance|polish|fix) ]]; then + analysis="CRITIQUE" + elif [[ "$prompt" =~ (review|feedback|peer|collaborate|suggest|advice) ]]; then + analysis="PEER_REVIEW" + else + # Default to direct for unclear cases + analysis="DIRECT" + fi + + echo "$analysis" +} + +# --- Mechanism Selection --- +echo "Analyzing prompt and selecting mechanism..." +echo "PROMPT ANALYSIS:" >> "${SESSION_FILE}" + +if [ "$FORCE_DIRECT" = true ]; then + MECHANISM="DIRECT" + REASON="User requested direct response with -d flag" +else + MECHANISM=$(analyze_prompt "$PROMPT") + case "$MECHANISM" in + "DIRECT") + REASON="Simple prompt or direct request" + ;; + "CONSENSUS") + REASON="Multiple perspectives or consensus needed" + ;; + "SYNTHESIS") + REASON="Integration of multiple approaches needed" + ;; + "EXPLORATION") + REASON="Systematic exploration of alternatives needed" + ;; + "SOCRATIC") + REASON="Deep analysis or exploration required" + ;; + "CRITIQUE") + REASON="Improvement or refinement requested" + ;; + "PEER_REVIEW") + REASON="Collaborative review or feedback needed" + ;; + *) + REASON="Default fallback" + MECHANISM="DIRECT" + ;; + esac +fi + +echo "Selected mechanism: ${MECHANISM}" >> "${SESSION_FILE}" +echo "Reason: ${REASON}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +echo "Selected mechanism: ${MECHANISM}" +echo "Reason: ${REASON}" +echo "---------------------------------" + +# --- Response Execution --- +echo "Executing selected mechanism..." +echo "RESPONSE EXECUTION:" >> "${SESSION_FILE}" + +case "$MECHANISM" in + "DIRECT") + echo "Using direct Ollama response..." + echo "DIRECT OLLAMA RESPONSE:" >> "${SESSION_FILE}" + + DIRECT_PROMPT="You are an expert assistant. You always flag if you don't know something. Please provide a clear, helpful response to the following prompt: ${PROMPT}" + + RESPONSE=$(ollama run "${DEFAULT_MODEL}" "${DIRECT_PROMPT}") + + echo "${RESPONSE}" >> "${SESSION_FILE}" + echo "" >> "${SESSION_FILE}" + + echo "---------------------------------" + echo "Direct response:" + echo "---------------------------------" + echo "${RESPONSE}" + ;; + + "CONSENSUS") + echo "Delegating to consensus mechanism..." + echo "DELEGATING TO CONSENSUS:" >> "${SESSION_FILE}" + + # Execute consensus script and display output directly + "${SCRIPT_DIR}/consensus" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; + + "SOCRATIC") + echo "Delegating to Socratic mechanism..." + echo "DELEGATING TO SOCRATIC:" >> "${SESSION_FILE}" + + # Execute Socratic script and display output directly + "${SCRIPT_DIR}/socratic" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; + + "CRITIQUE") + echo "Delegating to critique mechanism..." + echo "DELEGATING TO CRITIQUE:" >> "${SESSION_FILE}" + + # Execute critique script and display output directly + "${SCRIPT_DIR}/critique" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; + + "PEER_REVIEW") + echo "Delegating to peer-review mechanism..." + echo "DELEGATING TO PEER_REVIEW:" >> "${SESSION_FILE}" + + # Execute peer-review script and display output directly + "${SCRIPT_DIR}/peer-review" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; + + "SYNTHESIS") + echo "Delegating to synthesis mechanism..." + echo "DELEGATING TO SYNTHESIS:" >> "${SESSION_FILE}" + + # Execute synthesis script and display output directly + "${SCRIPT_DIR}/synthesis" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; + + "EXPLORATION") + echo "Delegating to exploration mechanism..." + echo "DELEGATING TO EXPLORATION:" >> "${SESSION_FILE}" + + # Execute exploration script and display output directly + "${SCRIPT_DIR}/exploration" "${PROMPT}" "${ROUNDS}" 2>&1 | tee -a "${SESSION_FILE}" + ;; +esac + +# --- Final Summary --- +echo "" >> "${SESSION_FILE}" +echo "DISPATCH SUMMARY:" >> "${SESSION_FILE}" +echo "================" >> "${SESSION_FILE}" +echo "Original Prompt: ${PROMPT}" >> "${SESSION_FILE}" +echo "Selected Mechanism: ${MECHANISM}" >> "${SESSION_FILE}" +echo "Reason: ${REASON}" >> "${SESSION_FILE}" +echo "Rounds: ${ROUNDS}" >> "${SESSION_FILE}" + +# End timing +duration=$(end_timer "$SESSION_ID" "computer") + +echo "" +echo "Execution time: ${duration} seconds" +echo "Full dispatch log: ${SESSION_FILE}" \ No newline at end of file diff --git a/bash/consensus b/bash/consensus index 3df1908..f978614 100755 --- a/bash/consensus +++ b/bash/consensus @@ -223,7 +223,8 @@ CLAIMED CONFIDENCE: ${confidence} Based on the quality, completeness, and accuracy of this response, what is your confidence level? Respond with only: low, medium, or high" - judge_confidence=$(ollama run "${JUDGE_MODEL}" "${JUDGE_PROMPT}" | tr '[:upper:]' '[:lower:]' | xargs) + judge_output=$(ollama run "${JUDGE_MODEL}" "${JUDGE_PROMPT}") + judge_confidence=$(echo "${judge_output}" | tr '[:upper:]' '[:lower:]' | grep -o -i "\(low\|medium\|high\)" | head -n1) # Validate judge confidence if [[ ! "$judge_confidence" =~ ^(low|medium|high)$ ]]; then diff --git a/bash/dds b/bash/critique index 1d7308f..35a1db0 100755 --- a/bash/dds +++ b/bash/critique @@ -1,10 +1,10 @@ #!/bin/bash -# Daydreaming System -# This script uses a sequence of LLM calls to refine an initial response. +# Critique System +# This script uses a sequence of LLM calls to refine an initial response through critique and revision. # # APPLICATION LOGIC: -# The daydreaming process implements an iterative refinement system where AI models +# The critique process implements an iterative refinement system where AI models # collaborate to improve response quality through critique and revision. The system # operates through three distinct phases designed to enhance clarity and accuracy: # @@ -48,8 +48,8 @@ DEFAULT_LOOPS=2 # --- Argument Validation --- if [ "$#" -lt 1 ]; then - echo -e "\n\tDaydreaming" - echo -e "\tThis script uses a sequence of LLM calls to refine an initial response." + echo -e "\n\tCritique" + echo -e "\tThis script uses a sequence of LLM calls to refine an initial response through critique and revision." echo -e "\n\tUsage: $0 [-f <file_path>] \"<your prompt>\" [number_of_refinement_loops]" echo -e "\n\tExample: $0 -f ./input.txt \"Please summarize this text file\" 2" echo -e "\n\tIf number_of_refinement_loops is not provided, the program will default to $DEFAULT_LOOPS loops." @@ -94,9 +94,9 @@ fi # Create a temporary directory if it doesn't exist mkdir -p ~/tmp # Create a unique file for this session based on the timestamp -SESSION_FILE=~/tmp/dds_$(date +%Y%m%d_%H%M%S).txt +SESSION_FILE=~/tmp/critique_$(date +%Y%m%d_%H%M%S).txt -echo "DDS Session Log: ${SESSION_FILE}" +echo "Critique Session Log: ${SESSION_FILE}" echo "---------------------------------" # --- Initial Prompt & Response --- @@ -148,7 +148,7 @@ done # --- Final Output --- echo "---------------------------------" -echo "DDS process complete." +echo "Critique process complete." echo "Final refined answer:" echo "---------------------------------" diff --git a/bash/exploration b/bash/exploration new file mode 100755 index 0000000..8dc09b7 --- /dev/null +++ b/bash/exploration @@ -0,0 +1,246 @@ +#!/bin/bash + +# Get the directory where this script is located +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Exploration System +# This script systematically explores multiple solution paths and compares alternatives. +# +# APPLICATION LOGIC: +# The exploration process implements a branching analysis system that systematically +# explores multiple approaches to a problem and compares their merits. The system +# operates through three distinct phases designed to maximize discovery and comparison: +# +# PHASE 1 - PATH GENERATION: +# - Identifies multiple possible approaches to the problem +# - Generates alternative solution paths +# - Ensures comprehensive coverage of the solution space +# - Creates a foundation for systematic comparison +# +# PHASE 2 - PATH EXPLORATION: +# - Explores each identified path in detail +# - Develops the implications and consequences of each approach +# - Identifies strengths, weaknesses, and trade-offs +# - Provides detailed analysis of each alternative +# +# PHASE 3 - COMPARATIVE ANALYSIS: +# - Systematically compares all explored paths +# - Evaluates relative merits and trade-offs +# - Identifies optimal approaches or combinations +# - Provides recommendations based on different criteria +# +# EXPLORATION MODELING: +# The system applies systematic exploration principles to AI response generation: +# - Multiple paths ensure comprehensive problem coverage +# - Systematic comparison reveals optimal approaches +# - Trade-off analysis helps users make informed decisions +# - The process may reveal unexpected insights or approaches +# - Transparency shows how different paths were evaluated +# - The method may identify novel solutions that single-path approaches miss +# +# The exploration process emphasizes systematic analysis and comparison, +# ensuring users understand the full range of possible approaches and their implications. + +# Source the logging system using absolute path +source "${SCRIPT_DIR}/logging.sh" + +# --- Model Configuration --- +EXPLORATION_MODEL="llama3:8b-instruct-q4_K_M" +ANALYSIS_MODEL="phi3:3.8b-mini-4k-instruct-q4_K_M" + +# --- Defaults --- +DEFAULT_PATHS=3 + +# --- Argument Validation --- +if [ "$#" -lt 1 ]; then + echo -e "\n\tExploration" + echo -e "\tThis script systematically explores multiple solution paths and compares alternatives." + echo -e "\n\tUsage: $0 [-f <file_path>] [-p <number_of_paths>] \"<your prompt>\" [number_of_rounds]" + echo -e "\n\tExample: $0 -f ./input.txt -p 4 \"How can we solve this problem?\" 2" + echo -e "\n\tIf number_of_rounds is not provided, the program will default to 2 rounds." + echo -e "\n\t-f <file_path> (optional): Append the contents of the file to the prompt." + echo -e "\n\t-p <paths> (optional): Number of solution paths to explore (default: 3)." + echo -e "\n" + exit 1 +fi + +# --- Argument Parsing --- +FILE_PATH="" +NUM_PATHS=$DEFAULT_PATHS +while getopts "f:p:" opt; do + case $opt in + f) + FILE_PATH="$OPTARG" + ;; + p) + NUM_PATHS="$OPTARG" + ;; + *) + echo "Invalid option: -$OPTARG" >&2 + exit 1 + ;; + esac +done +shift $((OPTIND -1)) + +PROMPT="$1" +if [ -z "$2" ]; then + ROUNDS=2 +else + ROUNDS=$2 +fi + +# If file path is provided, append its contents to the prompt +if [ -n "$FILE_PATH" ]; then + if [ ! -f "$FILE_PATH" ]; then + echo "File not found: $FILE_PATH" >&2 + exit 1 + fi + FILE_CONTENTS=$(cat "$FILE_PATH") + PROMPT="$PROMPT\n[FILE CONTENTS]\n$FILE_CONTENTS\n[END FILE]" +fi + +# --- File Initialization --- +mkdir -p ~/tmp +SESSION_FILE=~/tmp/exploration_$(date +%Y%m%d_%H%M%S).txt + +# Initialize timing +SESSION_ID=$(generate_session_id) +start_timer "$SESSION_ID" "exploration" + +echo "Exploration Session Log: ${SESSION_FILE}" +echo "---------------------------------" + +# Store the initial user prompt in the session file +echo "USER PROMPT: ${PROMPT}" >> "${SESSION_FILE}" +echo "NUMBER OF PATHS: ${NUM_PATHS}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +# --- Phase 1: Path Generation --- +echo "Phase 1: Generating solution paths..." +echo "PHASE 1 - PATH GENERATION:" >> "${SESSION_FILE}" + +PATH_GENERATION_PROMPT="You are a strategic thinker. Your task is to identify ${NUM_PATHS} distinct, viable approaches to the following problem. Each path should represent a different strategy or methodology. + +PROBLEM: ${PROMPT} + +Please identify ${NUM_PATHS} different solution paths. For each path, provide: +1. A clear name/title for the approach +2. A brief description of the strategy +3. The key principles or methodology it follows + +Format your response as: +PATH 1: [Name] +[Description] + +PATH 2: [Name] +[Description] + +etc. + +Ensure the paths are genuinely different approaches, not just variations of the same idea." + +paths_output=$(ollama run "${EXPLORATION_MODEL}" "${PATH_GENERATION_PROMPT}") + +echo "GENERATED PATHS:" >> "${SESSION_FILE}" +echo "${paths_output}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +# --- Phase 2: Path Exploration --- +echo "Phase 2: Exploring each path in detail..." +echo "PHASE 2 - PATH EXPLORATION:" >> "${SESSION_FILE}" + +declare -a path_analyses +declare -a path_names + +# Extract path names and explore each one +for i in $(seq 1 "${NUM_PATHS}"); do + echo " Exploring path ${i}..." + + # Extract the path description (simplified approach) + path_section=$(echo "${paths_output}" | sed -n "/PATH ${i}:/,/PATH $((i+1)):/p" | sed '$d') + if [ -z "$path_section" ]; then + path_section=$(echo "${paths_output}" | sed -n "/PATH ${i}:/,\$p") + fi + + # Generate path name from the section + path_name=$(echo "${path_section}" | head -n1 | sed 's/^PATH [0-9]*: //') + + EXPLORATION_PROMPT="You are a detailed analyst. Explore the following solution path in depth, considering its implications, requirements, and potential outcomes. + +ORIGINAL PROBLEM: ${PROMPT} + +PATH TO EXPLORE: ${path_section} + +Please provide a comprehensive analysis of this path including: +1. Detailed implementation approach +2. Potential benefits and advantages +3. Potential challenges and risks +4. Resource requirements and constraints +5. Expected outcomes and timeline +6. Success factors and critical elements +7. Potential variations or adaptations + +Provide a thorough, well-structured analysis." + + path_analysis=$(ollama run "${EXPLORATION_MODEL}" "${EXPLORATION_PROMPT}") + + path_analyses[$((i-1))]="${path_analysis}" + path_names[$((i-1))]="${path_name}" + + echo "PATH ${i} ANALYSIS (${path_name}):" >> "${SESSION_FILE}" + echo "${path_analysis}" >> "${SESSION_FILE}" + echo "" >> "${SESSION_FILE}" +done + +# --- Phase 3: Comparative Analysis --- +echo "Phase 3: Comparing and evaluating paths..." +echo "PHASE 3 - COMPARATIVE ANALYSIS:" >> "${SESSION_FILE}" + +# Create comparison prompt +COMPARISON_PROMPT="You are a strategic analyst. Compare and evaluate the following solution paths for the given problem. + +ORIGINAL PROBLEM: ${PROMPT} + +PATH ANALYSES:" + +for i in $(seq 0 $((NUM_PATHS-1))); do + COMPARISON_PROMPT="${COMPARISON_PROMPT} + +PATH ${i+1}: ${path_names[$i]} +${path_analyses[$i]}" +done + +COMPARISON_PROMPT="${COMPARISON_PROMPT} + +Please provide a comprehensive comparative analysis including: +1. Direct comparison of approaches across key criteria +2. Relative strengths and weaknesses of each path +3. Trade-offs and opportunity costs +4. Risk assessment for each approach +5. Recommendations based on different scenarios +6. Potential for combining elements from multiple paths +7. Final recommendation with justification + +Provide a clear, structured comparison that helps decision-making." + +comparative_analysis=$(ollama run "${ANALYSIS_MODEL}" "${COMPARISON_PROMPT}") + +echo "COMPARATIVE ANALYSIS:" >> "${SESSION_FILE}" +echo "${comparative_analysis}" >> "${SESSION_FILE}" + +# End timing +duration=$(end_timer "$SESSION_ID" "exploration") + +# --- Final Output --- +echo "---------------------------------" +echo "Exploration process complete." +echo "Comparative analysis:" +echo "---------------------------------" + +echo "${comparative_analysis}" +echo "" +echo "Paths explored: ${NUM_PATHS}" +echo "Execution time: ${duration} seconds" +echo "" +echo "Full exploration log: ${SESSION_FILE}" \ No newline at end of file diff --git a/bash/logging.sh b/bash/logging.sh new file mode 100755 index 0000000..c37aaf4 --- /dev/null +++ b/bash/logging.sh @@ -0,0 +1,146 @@ +#!/bin/bash + +# Unified Logging System +# This script provides consistent logging and performance metrics across all thinking mechanisms. + +# --- Logging Configuration --- +LOG_DIR=~/tmp/ai_thinking +METRICS_FILE="${LOG_DIR}/performance_metrics.json" +SESSION_LOG="${LOG_DIR}/session_$(date +%Y%m%d_%H%M%S).json" + +# Create logging directory +mkdir -p "${LOG_DIR}" + +# --- Timing Functions --- +start_timer() { + local session_id="$1" + local mechanism="$2" + local start_time=$(date +%s.%N) + + # Store start time + echo "$start_time" > "/tmp/${session_id}_start" + + # Log session start + log_session_start "$session_id" "$mechanism" "$start_time" +} + +end_timer() { + local session_id="$1" + local mechanism="$2" + local end_time=$(date +%s.%N) + local start_time=$(cat "/tmp/${session_id}_start" 2>/dev/null || echo "$end_time") + + # Calculate duration + local duration=$(echo "$end_time - $start_time" | bc -l 2>/dev/null || echo "0") + + # Log session end + log_session_end "$session_id" "$mechanism" "$end_time" "$duration" + + # Clean up + rm -f "/tmp/${session_id}_start" + + echo "$duration" +} + +# --- Session Logging --- +log_session_start() { + local session_id="$1" + local mechanism="$2" + local start_time="$3" + + cat > "${SESSION_LOG}" << EOF +{ + "session_id": "${session_id}", + "mechanism": "${mechanism}", + "start_time": "${start_time}", + "prompt": "${PROMPT:-""}", + "status": "started" +} +EOF +} + +log_session_end() { + local session_id="$1" + local mechanism="$2" + local end_time="$3" + local duration="$4" + + # Update session log + cat > "${SESSION_LOG}" << EOF +{ + "session_id": "${session_id}", + "mechanism": "${mechanism}", + "start_time": "$(cat "${SESSION_LOG}" | jq -r '.start_time' 2>/dev/null || echo "")", + "end_time": "${end_time}", + "duration": "${duration}", + "prompt": "${PROMPT:-""}", + "status": "completed" +} +EOF + + # Update metrics file + update_metrics "$mechanism" "$duration" +} + +# --- Metrics Management --- +update_metrics() { + local mechanism="$1" + local duration="$2" + + # Create metrics file if it doesn't exist + if [ ! -f "${METRICS_FILE}" ]; then + cat > "${METRICS_FILE}" << EOF +{ + "total_sessions": 0, + "mechanisms": {}, + "average_durations": {} +} +EOF + fi + + # Update metrics using jq (if available) or simple text processing + if command -v jq >/dev/null 2>&1; then + # Use jq for proper JSON handling + local temp_file=$(mktemp) + jq --arg mechanism "$mechanism" --arg duration "$duration" ' + .total_sessions += 1 | + .mechanisms[$mechanism] = (.mechanisms[$mechanism] // 0) + 1 | + .average_durations[$mechanism] = ( + (.average_durations[$mechanism] // 0) * (.mechanisms[$mechanism] - 1) + ($duration | tonumber) + ) / .mechanisms[$mechanism] + ' "${METRICS_FILE}" > "$temp_file" + mv "$temp_file" "${METRICS_FILE}" + else + # Fallback: simple text-based metrics + echo "$(date +%Y%m%d_%H%M%S),${mechanism},${duration}" >> "${LOG_DIR}/simple_metrics.csv" + fi +} + +# --- Utility Functions --- +generate_session_id() { + echo "session_$(date +%Y%m%d_%H%M%S)_$$" +} + +get_metrics_summary() { + if [ -f "${METRICS_FILE}" ]; then + echo "=== Performance Metrics ===" + if command -v jq >/dev/null 2>&1; then + jq -r '.mechanisms | to_entries[] | "\(.key): \(.value) sessions"' "${METRICS_FILE}" + echo "" + jq -r '.average_durations | to_entries[] | "\(.key): \(.value | tonumber | floor)s average"' "${METRICS_FILE}" + else + echo "Metrics available in: ${METRICS_FILE}" + fi + else + echo "No metrics available yet." + fi +} + +# --- Export Functions for Other Scripts --- +export -f start_timer +export -f end_timer +export -f log_session_start +export -f log_session_end +export -f update_metrics +export -f generate_session_id +export -f get_metrics_summary \ No newline at end of file diff --git a/bash/metrics b/bash/metrics new file mode 100755 index 0000000..ad430b5 --- /dev/null +++ b/bash/metrics @@ -0,0 +1,18 @@ +#!/bin/bash + +# Metrics Viewer +# This script displays performance metrics from the unified logging system. + +# Source the logging system +source ./logging.sh + +echo "AI Thinking System - Performance Metrics" +echo "========================================" +echo "" + +# Display metrics summary +get_metrics_summary + +echo "" +echo "Detailed metrics file: ${METRICS_FILE}" +echo "Session logs directory: ${LOG_DIR}" \ No newline at end of file diff --git a/bash/synthesis b/bash/synthesis new file mode 100755 index 0000000..417279e --- /dev/null +++ b/bash/synthesis @@ -0,0 +1,240 @@ +#!/bin/bash + +# Get the directory where this script is located +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Synthesis System +# This script combines outputs from multiple thinking mechanisms into a coherent final response. +# +# APPLICATION LOGIC: +# The synthesis process implements a multi-mechanism integration system that combines +# outputs from different thinking strategies into a unified, coherent response. The system +# operates through three distinct phases designed to maximize comprehensiveness and clarity: +# +# PHASE 1 - MECHANISM EXECUTION: +# - Executes multiple thinking mechanisms on the same prompt +# - Collects diverse perspectives and approaches +# - Ensures comprehensive coverage of the problem space +# - Creates a foundation for synthesis +# +# PHASE 2 - CONFLICT RESOLUTION: +# - Identifies contradictions and conflicts between mechanism outputs +# - Resolves disagreements through logical analysis +# - Prioritizes information based on confidence and relevance +# - Maintains intellectual honesty about uncertainties +# +# PHASE 3 - SYNTHESIS GENERATION: +# - Combines the best elements from each mechanism +# - Creates a coherent narrative that addresses all aspects +# - Provides a unified response that leverages multiple perspectives +# - Ensures the final output is greater than the sum of its parts +# +# SYNTHESIS MODELING: +# The system applies integrative thinking principles to AI response generation: +# - Multiple mechanisms provide diverse perspectives on the same problem +# - Conflict resolution ensures logical consistency in the final output +# - Synthesis leverages the strengths of each individual mechanism +# - The process may reveal insights that individual mechanisms miss +# - Transparency shows how different perspectives were integrated +# - The method may provide more comprehensive and balanced responses +# +# The synthesis process emphasizes comprehensiveness and coherence, +# ensuring users get the benefits of multiple thinking approaches in a unified response. + +# Source the logging system using absolute path +source "${SCRIPT_DIR}/logging.sh" + +# --- Model Configuration --- +SYNTHESIS_MODEL="llama3:8b-instruct-q4_K_M" + +# --- Defaults --- +DEFAULT_MECHANISMS=("consensus" "critique" "socratic") + +# --- Argument Validation --- +if [ "$#" -lt 1 ]; then + echo -e "\n\tSynthesis" + echo -e "\tThis script combines outputs from multiple thinking mechanisms into a coherent final response." + echo -e "\n\tUsage: $0 [-f <file_path>] [-m <mechanism1,mechanism2,...>] \"<your prompt>\" [number_of_rounds]" + echo -e "\n\tExample: $0 -f ./input.txt -m consensus,critique \"Please analyze this text\" 2" + echo -e "\n\tIf number_of_rounds is not provided, the program will default to 2 rounds." + echo -e "\n\t-f <file_path> (optional): Append the contents of the file to the prompt." + echo -e "\n\t-m <mechanisms> (optional): Comma-separated list of mechanisms to use (default: consensus,critique,socratic)." + echo -e "\n" + exit 1 +fi + +# --- Argument Parsing --- +FILE_PATH="" +MECHANISMS_STR="" +while getopts "f:m:" opt; do + case $opt in + f) + FILE_PATH="$OPTARG" + ;; + m) + MECHANISMS_STR="$OPTARG" + ;; + *) + echo "Invalid option: -$OPTARG" >&2 + exit 1 + ;; + esac +done +shift $((OPTIND -1)) + +PROMPT="$1" +if [ -z "$2" ]; then + ROUNDS=2 +else + ROUNDS=$2 +fi + +# Parse mechanisms +if [ -n "$MECHANISMS_STR" ]; then + IFS=',' read -ra MECHANISMS <<< "$MECHANISMS_STR" +else + MECHANISMS=("${DEFAULT_MECHANISMS[@]}") +fi + +# If file path is provided, append its contents to the prompt +if [ -n "$FILE_PATH" ]; then + if [ ! -f "$FILE_PATH" ]; then + echo "File not found: $FILE_PATH" >&2 + exit 1 + fi + FILE_CONTENTS=$(cat "$FILE_PATH") + PROMPT="$PROMPT\n[FILE CONTENTS]\n$FILE_CONTENTS\n[END FILE]" +fi + +# --- File Initialization --- +mkdir -p ~/tmp +SESSION_FILE=~/tmp/synthesis_$(date +%Y%m%d_%H%M%S).txt + +# Initialize timing +SESSION_ID=$(generate_session_id) +start_timer "$SESSION_ID" "synthesis" + +echo "Synthesis Session Log: ${SESSION_FILE}" +echo "---------------------------------" + +# Store the initial user prompt in the session file +echo "USER PROMPT: ${PROMPT}" >> "${SESSION_FILE}" +echo "MECHANISMS: ${MECHANISMS[*]}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +# --- Phase 1: Mechanism Execution --- +echo "Phase 1: Executing thinking mechanisms..." +echo "PHASE 1 - MECHANISM EXECUTION:" >> "${SESSION_FILE}" + +declare -a mechanism_outputs +declare -a mechanism_names + +for i in "${!MECHANISMS[@]}"; do + mechanism="${MECHANISMS[$i]}" + echo " Executing ${mechanism} mechanism..." + + # Execute the mechanism using absolute path + if [ -f "${SCRIPT_DIR}/${mechanism}" ]; then + output=$("${SCRIPT_DIR}/${mechanism}" "${PROMPT}" "${ROUNDS}" 2>&1) + mechanism_outputs[$i]="${output}" + mechanism_names[$i]="${mechanism}" + + echo "MECHANISM ${i+1} (${mechanism}):" >> "${SESSION_FILE}" + echo "${output}" >> "${SESSION_FILE}" + echo "" >> "${SESSION_FILE}" + else + echo " WARNING: Mechanism ${mechanism} not found, skipping..." >&2 + fi +done + +# --- Phase 2: Conflict Resolution --- +echo "Phase 2: Analyzing and resolving conflicts..." +echo "PHASE 2 - CONFLICT RESOLUTION:" >> "${SESSION_FILE}" + +# Create conflict analysis prompt +CONFLICT_PROMPT="You are a conflict resolution specialist. Analyze the following outputs from different thinking mechanisms and identify any contradictions, conflicts, or areas of disagreement. + +ORIGINAL PROMPT: ${PROMPT} + +MECHANISM OUTPUTS:" + +for i in "${!MECHANISMS[@]}"; do + if [ -n "${mechanism_outputs[$i]}" ]; then + CONFLICT_PROMPT="${CONFLICT_PROMPT} + +${mechanism_names[$i]} OUTPUT: +${mechanism_outputs[$i]}" + fi +done + +CONFLICT_PROMPT="${CONFLICT_PROMPT} + +Please identify: +1. Any direct contradictions between the outputs +2. Areas where the mechanisms disagree +3. Information that appears to be conflicting +4. How these conflicts might be resolved + +Provide a clear analysis of conflicts and potential resolutions." + +conflict_analysis=$(ollama run "${SYNTHESIS_MODEL}" "${CONFLICT_PROMPT}") + +echo "CONFLICT ANALYSIS:" >> "${SESSION_FILE}" +echo "${conflict_analysis}" >> "${SESSION_FILE}" +echo "" >> "${SESSION_FILE}" + +# --- Phase 3: Synthesis Generation --- +echo "Phase 3: Generating unified synthesis..." +echo "PHASE 3 - SYNTHESIS GENERATION:" >> "${SESSION_FILE}" + +# Create synthesis prompt +SYNTHESIS_PROMPT="You are a synthesis specialist. Your task is to combine the outputs from multiple thinking mechanisms into a coherent, unified response that leverages the strengths of each approach. + +ORIGINAL PROMPT: ${PROMPT} + +MECHANISM OUTPUTS:" + +for i in "${!MECHANISMS[@]}"; do + if [ -n "${mechanism_outputs[$i]}" ]; then + SYNTHESIS_PROMPT="${SYNTHESIS_PROMPT} + +${mechanism_names[$i]} OUTPUT: +${mechanism_outputs[$i]}" + fi +done + +SYNTHESIS_PROMPT="${SYNTHESIS_PROMPT} + +CONFLICT ANALYSIS: +${conflict_analysis} + +Please create a unified synthesis that: +1. Combines the best insights from each mechanism +2. Resolves any identified conflicts logically +3. Provides a comprehensive response that addresses all aspects +4. Maintains intellectual honesty about uncertainties +5. Creates a coherent narrative that flows naturally +6. Leverages the unique strengths of each thinking approach + +Your synthesis should be greater than the sum of its parts - it should provide insights that individual mechanisms might miss." + +final_synthesis=$(ollama run "${SYNTHESIS_MODEL}" "${SYNTHESIS_PROMPT}") + +echo "FINAL SYNTHESIS:" >> "${SESSION_FILE}" +echo "${final_synthesis}" >> "${SESSION_FILE}" + +# End timing +duration=$(end_timer "$SESSION_ID" "synthesis") + +# --- Final Output --- +echo "---------------------------------" +echo "Synthesis process complete." +echo "Final unified response:" +echo "---------------------------------" + +echo "${final_synthesis}" +echo "" +echo "Mechanisms used: ${MECHANISMS[*]}" +echo "Execution time: ${duration} seconds" +echo "" +echo "Full synthesis log: ${SESSION_FILE}" \ No newline at end of file diff --git a/bash/c-2-f b/bash/unit-conversion/c-2-f index 597cb48..597cb48 100755 --- a/bash/c-2-f +++ b/bash/unit-conversion/c-2-f diff --git a/bash/f-2-c b/bash/unit-conversion/f-2-c index b50df93..b50df93 100755 --- a/bash/f-2-c +++ b/bash/unit-conversion/f-2-c |