diff options
-rw-r--r-- | awk/scheme/scheme/README.md | 60 | ||||
-rw-r--r-- | awk/scheme/scheme/TODO.txt | 95 | ||||
-rwxr-xr-x | awk/scheme/scheme/bin/compiler.awk | 132 | ||||
-rwxr-xr-x | awk/scheme/scheme/bin/repl | 36 | ||||
-rwxr-xr-x | awk/scheme/scheme/bin/vm.awk | 432 | ||||
-rw-r--r-- | awk/scheme/scheme/diagram.md | 58 | ||||
-rw-r--r-- | awk/scheme/scheme/scratch/arch-notes.md | 115 | ||||
-rw-r--r-- | awk/scheme/scheme/scratch/ideas-for-string-support.md | 556 | ||||
-rw-r--r-- | awk/scheme/scheme/scratch/random.txt | 112 |
9 files changed, 1413 insertions, 183 deletions
diff --git a/awk/scheme/scheme/README.md b/awk/scheme/scheme/README.md index 7711442..5dae265 100644 --- a/awk/scheme/scheme/README.md +++ b/awk/scheme/scheme/README.md @@ -1,10 +1,12 @@ # Awk-Scheme ## Overview -A Scheme interpreter implemented in AWK, featuring: -- A compiler that translates Scheme expressions to stack-based VM instructions +A scheme interpreter implemented in AWK: + +- A compiler that translates scheme expressions to stack-based VM instructions - A virtual machine that executes the compiled code - Support for basic arithmetic, list operations, functions, and variable bindings +- **Closure support** for nested function definitions and lexical scoping - Persistent global state between REPL sessions ## Architecture @@ -12,15 +14,17 @@ A Scheme interpreter implemented in AWK, featuring: ### Components 1. **Compiler** (`bin/compiler.awk`): - Lexical analyzer for tokenizing input - - Recursive descent parser for Scheme expressions + - Recursive descent parser for scheme expressions - Code generator that produces stack-based VM instructions - Handles nested expressions and proper scoping + - **Generates closure creation code for lambda expressions** 2. **Virtual Machine** (`bin/vm.awk`): - Stack-based execution model - Typed value system with runtime type checking - Environment-based variable binding - - Function call/return mechanism + - **Direct function execution** (no program array modification) + - **Closure support** with environment capture and restoration - Heap memory for cons cells - State persistence for globals and functions @@ -38,6 +42,7 @@ All values are tagged with their type: - `F:name` - Functions - `NIL:` - Empty list - `S:name` - Symbols +- **`CLOSURE:func_name:env_id` - Closure objects (function + captured environment)** ## Usage @@ -100,6 +105,24 @@ scheme> (let ((x 10) (y 20)) (+ x y)) N:30 ``` +7. **Lambda Functions**: +```scheme +scheme> ((lambda (x) (+ x 1)) 41) +N:42 +scheme> ((lambda (x y) (+ x y)) 20 22) +N:42 +``` + +8. **Closures**: +```scheme +scheme> (let ((x 10)) ((lambda (y) (+ x y)) 32)) +N:42 +scheme> (let ((add1 (lambda (x) (+ x 1)))) (add1 41)) +N:42 +``` + +**Note**: Nested lambda definitions (lambdas defined inside other lambdas) are not yet supported. + ## Implementation Details ### Compiler Pipeline @@ -115,6 +138,7 @@ N:30 - Translates expressions to VM instructions - Manages scope and variable bindings - Handles function definitions + - **Generates closure creation code for lambda expressions** ### Virtual Machine 1. **Stack Operations**: @@ -124,11 +148,16 @@ N:30 - LOOKUP: Retrieve variable value 2. **Function Calls**: - - CALL: Invoke function - - RETURN: Return from function + - **Direct execution** of function code with proper program restoration - Environment frame management + - **Closure support** with environment capture and restoration + +3. **Closure System**: + - **CAPTURE_ENV**: Captures current environment for closure creation + - **Environment restoration**: Restores captured environments on closure calls + - **Variable lookup enhancement**: Checks closure environments for variable resolution -3. **Memory Management**: +4. **Memory Management**: - Stack-based evaluation - Simple heap for cons cells - Basic reference counting (not fully implemented) @@ -137,6 +166,7 @@ N:30 - Global variables and functions persist between sessions - State stored in `/tmp/scheme_vm.state` and `/tmp/scheme_vm.env` - Automatic state loading/saving +- **Note**: State persistence is enabled by default in the REPL ## Extending the System @@ -165,9 +195,12 @@ N:30 - VM execution - Stack operations - Environment changes + - **Closure creation and restoration** +- **Note**: Some debug output may be suppressed in the current implementation ## Limitations Current implementation does not support: +- **Nested lambda definitions** (lambdas defined inside other lambdas) - this causes "Undefined closure function" errors - Floating point numbers - Strings - Proper tail recursion @@ -178,9 +211,10 @@ Current implementation does not support: - Standard library ## Future Enhancements -1. Proper tail call optimization -2. Garbage collection -3. Error recovery -4. More data types -5. Standard library -6. Better debugging tools \ No newline at end of file +1. **Nested lambda support** (complex but achievable) +2. Proper tail call optimization +3. Garbage collection +4. Error recovery +5. More data types +6. Standard library +7. Better debugging tools \ No newline at end of file diff --git a/awk/scheme/scheme/TODO.txt b/awk/scheme/scheme/TODO.txt index 31723a4..f576fba 100644 --- a/awk/scheme/scheme/TODO.txt +++ b/awk/scheme/scheme/TODO.txt @@ -5,65 +5,66 @@ Current State: ------------- - Basic arithmetic operations (+,-,*,/) are working - Let expressions with simple bindings are working -- Function definitions (define) and lambda expressions are partially implemented +- Function definitions (define) and lambda expressions are implemented (no nested lambdas) - Stack-based virtual machine with environment for variable bindings +- Closures supported for top-level lambdas (not nested) +- State persistence for globals and functions between REPL sessions Recent Changes: -------------- -1. Fixed function table array naming conflicts (func_name -> func_def_names) -2. Modified vm_get_value to handle function names correctly -3. Attempted to fix argument handling in function calls -4. Modified lambda compilation to avoid pushing function name twice +1. Improved function table and environment handling +2. Enhanced debug logging for function calls and environment +3. Refined lambda compilation and closure creation +4. Updated documentation and architectural notes -Current Issues: --------------- -1. Function Definitions (define): - - Function calls are failing with "ADD requires numeric operands" - - Arguments may not be properly passed to functions - - Function environment may not be properly set up - -2. Lambda Expressions: - - Direct lambda calls are failing - - Lambda bindings in let expressions are failing - - Function environment for lambda parameters may not be correct +Current Issues / Limitations: +---------------------------- +1. **Nested Lambda Support**: + - Lambdas defined inside other lambdas are not supported ("Undefined closure function" errors) +2. **Reference Counting**: + - Reference counting for heap objects (cons cells) is stubbed but not enforced +3. **Error Handling**: + - Minimal error recovery, especially in the REPL +4. **Tail Recursion**: + - No proper tail call optimization +5. **Data Types**: + - No floating point, string, or full numeric tower support +6. **Garbage Collection**: + - No GC; memory for cons cells is not reclaimed +7. **Standard Library**: + - No standard library or macros Next Steps: ----------- -1. Debug Function Calls: - - Add detailed debug logging in vm_call_function - - Verify argument handling and environment setup - - Check if function code is being properly stored and retrieved - -2. Fix Function Environment: - - Review how parameters are stored in the environment - - Ensure environment is properly restored after function calls - - Verify parameter values are correctly passed to functions - -3. Improve Lambda Support: - - Review lambda compilation process - - Ensure lambda functions are properly stored and called - - Fix environment handling for lambda parameters - -4. Testing Strategy: - - Create test cases for each type of function call - - Add debug output to track stack and environment state - - Verify each step of function compilation and execution - -5. Code Cleanup: - - Review and document function handling code - - Ensure consistent naming and error handling - - Add comments explaining complex operations +1. **Implement Nested Lambda Support**: + - Refactor closure environment capture and restoration + - Update compiler and VM to support nested closures +2. **Improve Memory Management**: + - Implement or enforce reference counting + - Plan for future garbage collection +3. **Enhance Error Handling**: + - Add better error messages and recovery in REPL and VM +4. **Tail Call Optimization**: + - Investigate and implement proper tail recursion +5. **Testing and Debugging**: + - Expand test suite for edge cases (especially closures and let/lambda) + - Continue using DEBUG=1 for tracing execution +6. **Documentation and Code Quality**: + - Keep architectural notes and code comments up to date + - Document any new patterns or changes Technical Notes: --------------- -- Function calls use a combination of LOOKUP, GET_VALUE, and CALL instructions -- Environment stack handles variable bindings and function parameters -- Function code is stored in FUNCTIONS table with unique names -- Lambda functions use __lambda_N naming scheme +- Function calls use LOOKUP, GET_VALUE, and CALL instructions +- Environment stack manages variable bindings and function parameters +- Function code is stored in the FUNCTIONS table with unique names +- Lambda functions use __lambda_N naming scheme; closures capture environment at creation +- State is persisted to /tmp/scheme_vm.state and /tmp/scheme_vm.env Debugging Tips: -------------- 1. Enable DEBUG=1 to see detailed execution logs -2. Check stack contents before and after each operation -3. Verify environment state during function calls -4. Monitor function code storage and retrieval \ No newline at end of file +2. Check stack and environment contents before and after each operation +3. Monitor closure creation and environment capture/restoration +4. Verify function code storage and retrieval in the FUNCTIONS table +5. Use and expand test cases for all supported features \ No newline at end of file diff --git a/awk/scheme/scheme/bin/compiler.awk b/awk/scheme/scheme/bin/compiler.awk index debaa2c..94d6e99 100755 --- a/awk/scheme/scheme/bin/compiler.awk +++ b/awk/scheme/scheme/bin/compiler.awk @@ -11,23 +11,28 @@ BEGIN { next_label = 0 # Counter for generating unique labels program = "" # Accumulates the full program text - # Debug mode disabled by default, can be enabled via DEBUG=1 environment variable + # AWK FEATURE: ENVIRON is a built-in array containing environment variables + # Unlike JS process.env, this is automatically available in awk DEBUG = (ENVIRON["DEBUG"] == "1") ? 1 : 0 } # Debug logging helper function function debug(msg) { + # AWK FEATURE: printf with > "/dev/stderr" redirects output to stderr + # Unlike console.error() in JS, this is how awk handles stderr output if (DEBUG) printf("[DEBUG] %s\n", msg) > "/dev/stderr" } -# Each line of input is accumulated into the program string -# This allows handling multi-line expressions +# AWK FEATURE: Each line of input is automatically processed by this block +# This is awk's main input processing loop - every line from stdin/files goes here +# In JS, you'd need to explicitly read lines from a stream { if (program != "") program = program "\n" - program = program $0 + program = program $0 # $0 is the current line being processed } -# Main compiler entry point - called after all input is read +# AWK FEATURE: END block runs after all input has been processed +# This is like a "finally" block that always executes after reading all input END { debug("Raw program:\n" program) if (program == "") exit @@ -47,9 +52,12 @@ function split_expressions(prog, current, paren_count, i, c, expr, cleaned) { # 2. Normalize whitespace # 3. Trim leading/trailing whitespace cleaned = prog + # AWK FEATURE: gsub() is a built-in function for global substitution (like replaceAll in JS) + # gsub(pattern, replacement, target) - modifies target in place and returns count gsub(/;[^(]*\(/, "(", cleaned) # Remove comments before expressions gsub(/\)[^)]*;/, ")", cleaned) # Remove comments after expressions gsub(/[ \t\n]+/, " ", cleaned) # Normalize whitespace to single spaces + # AWK FEATURE: sub() is like gsub() but only replaces the first occurrence sub(/^[ \t\n]+/, "", cleaned) # Trim leading whitespace sub(/[ \t\n]+$/, "", cleaned) # Trim trailing whitespace @@ -58,6 +66,8 @@ function split_expressions(prog, current, paren_count, i, c, expr, cleaned) { if (cleaned == "") return # Parse expressions by tracking parenthesis nesting + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property for (i = 1; i <= length(cleaned); i++) { c = substr(cleaned, i, 1) @@ -90,6 +100,8 @@ function split_expressions(prog, current, paren_count, i, c, expr, cleaned) { } # Lexer helper functions for character classification +# AWK FEATURE: String comparison with >= and <= works lexicographically +# Unlike JS where you need to convert to numbers, awk can compare strings directly function is_digit(c) { return c >= "0" && c <= "9" } function is_whitespace(c) { return c == " " || c == "\t" || c == "\n" } @@ -100,6 +112,8 @@ function next_token() { if (input_buffer == "") input_buffer = program # Skip whitespace between tokens + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property while (length(input_buffer) > 0 && is_whitespace(substr(input_buffer, 1, 1))) input_buffer = substr(input_buffer, 2) @@ -113,8 +127,14 @@ function next_token() { } # Handle numbers (including negative numbers) + # AWK FEATURE: substr(string, start, length) extracts substring + # Unlike JS string.slice(), this is 1-indexed and requires explicit length + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property if (is_digit(c) || c == "-" && length(input_buffer) > 1 && is_digit(substr(input_buffer, 2, 1))) { num = "" + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property while (length(input_buffer) > 0) { c = substr(input_buffer, 1, 1) if (!is_digit(c) && c != "-") break @@ -126,6 +146,8 @@ function next_token() { # Handle symbols (identifiers and operators) sym = "" + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property while (length(input_buffer) > 0) { c = substr(input_buffer, 1, 1) if (is_whitespace(c) || c == "(" || c == ")") break @@ -170,6 +192,8 @@ function parse_list(result, expr) { # Splits an expression into operator and arguments # Handles nested expressions correctly function split_expr(expr, i, len, c, op, args, paren_count) { + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property len = length(expr) paren_count = 0 @@ -190,11 +214,16 @@ function split_expr(expr, i, len, c, op, args, paren_count) { } debug("Split expr: op=" op " args=" args) + # AWK FEATURE: SUBSEP is a built-in variable used as separator for array indices + # When you use array[key1,key2], awk internally stores it as array[key1 SUBSEP key2] + # This allows multi-dimensional arrays in awk (though they're really single-dimensional) return op SUBSEP args } # Splits argument list handling nested parentheses function split_args(args, arg_array, len, i, c, current, paren_count, arg_count) { + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property len = length(args) current = "" paren_count = 0 @@ -233,18 +262,20 @@ function compile_primitive_call(op, args, arg_array, nargs, i) { nargs = split_args(args, arg_array) # Check if this is a lambda function call - if (substr(op, 1, 1) == "(") { + # AWK FEATURE: ~ is the regex match operator (like /pattern/.test() in JS) + # The pattern is a regex literal, not a string + if (op ~ /^\(lambda /) { # This is a lambda function call - # First compile the lambda function - compile_expr(op) - - # Then compile all arguments + # First compile all arguments for (i = 1; i <= nargs; i++) { compile_expr(arg_array[i]) } - # Call the function - print "CALL __lambda_" (next_label - 1) + # Then compile the lambda function (this will push the function name) + compile_expr(op) + + # Call the function - the lambda name is now on top of stack + print "CALL" return } @@ -316,58 +347,58 @@ function compile_primitive_call(op, args, arg_array, nargs, i) { else { # Function call for user-defined functions debug("Function call: " op) - # Look up the function name - print "LOOKUP " op - # Get the actual function name - print "GET_VALUE" - # Then compile arguments + # First compile arguments for (i = 1; i <= nargs; i++) { compile_expr(arg_array[i]) } + # Then look up the function name + print "LOOKUP " op + # Get the actual function name + print "GET_VALUE" # Call the function print "CALL" } } # Splits let bindings into individual variable/value pairs -function split_bindings(bindings, binding_array, count, current, paren_count, i, c, in_lambda) { +function split_bindings(bindings, binding_array, count, current, paren_count, i, c) { count = 0 current = "" paren_count = 0 - in_lambda = 0 + debug("split_bindings: parsing [" bindings "]") + + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property for (i = 1; i <= length(bindings); i++) { c = substr(bindings, i, 1) # Track nested parentheses if (c == "(") { paren_count++ - if (paren_count == 1 && !in_lambda) { + if (paren_count == 1) { current = "" # Start new binding continue } } if (c == ")") { paren_count-- - if (paren_count == 0 && !in_lambda) { + if (paren_count == 0) { # End of binding binding_array[++count] = current + debug("split_bindings: found binding [" current "]") current = "" continue } } - # Track if we're inside a lambda expression - if (substr(bindings, i, 7) == "lambda ") { - in_lambda = 1 - } - - # Only add character if we're inside a binding + # Add character if we're inside a binding if (paren_count > 0) { current = current c } } + debug("split_bindings: found " count " bindings") return count } @@ -379,6 +410,8 @@ function compile_let(args, bindings, body, binding_array, nbindings, i, var, val # Find matching closing parenthesis for bindings paren_count = 1 i = 2 + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property while (paren_count > 0 && i <= length(args)) { if (substr(args, i, 1) == "(") paren_count++ if (substr(args, i, 1) == ")") paren_count-- @@ -416,8 +449,8 @@ function compile_let(args, bindings, body, binding_array, nbindings, i, var, val # Handle lambda or other compound expressions if (substr(val, 2, 6) == "lambda") { # This is a lambda expression - # First compile the lambda - compile_lambda(substr(val, 8)) # Skip "(lambda " + # Pass the entire lambda expression to compile_lambda + compile_lambda(val) # Store the function name in the environment print "STORE " var } else { @@ -470,6 +503,8 @@ function compile_define(args, name, params, body, param_array, nparams, i, paren print "LABEL " name # Process parameters + # AWK FEATURE: split(string, array, separator) splits string into array elements + # Unlike JS string.split() which returns an array, this populates an existing array nparams = split(params, param_array, " ") for (i = 1; i <= nparams; i++) { print "STORE " param_array[i] @@ -496,12 +531,42 @@ function compile_lambda(args, params, body, param_array, nparams, i, lambda_name # Generate a unique name for the lambda function lambda_name = "__lambda_" next_label++ + debug("compile_lambda: args = [" args "]") + + # Handle both full lambda expression and just the arguments part + if (substr(args, 1, 7) == "(lambda") { + debug("compile_lambda: detected full lambda expression") + # Full lambda expression: (lambda (params) body) + args = substr(args, 8) # Remove "(lambda" + debug("compile_lambda: after removing (lambda: [" args "]") + # Find the closing parenthesis of the entire lambda expression + paren_count = 0 + i = 1 + while (i <= length(args)) { + c = substr(args, i, 1) + if (c == "(") paren_count++ + if (c == ")") { + paren_count-- + if (paren_count == -1) break # Found the closing paren of the lambda + } + i++ + } + if (paren_count != -1) error("Unmatched parenthesis in lambda") + args = substr(args, 1, i - 1) # Remove the closing parenthesis + debug("compile_lambda: after removing closing paren: [" args "]") + # Remove leading whitespace + sub(/^[ \t\n]+/, "", args) + debug("compile_lambda: after removing leading whitespace: [" args "]") + } + # Split into parameters and body if (substr(args, 1, 1) != "(") error("Malformed lambda expression") # Find matching closing parenthesis for parameters paren_count = 1 i = 2 + # AWK FEATURE: length(string) returns the length of a string + # Unlike JS string.length, this is a function call, not a property while (paren_count > 0 && i <= length(args)) { if (substr(args, i, 1) == "(") paren_count++ if (substr(args, i, 1) == ")") paren_count-- @@ -523,6 +588,8 @@ function compile_lambda(args, params, body, param_array, nparams, i, lambda_name print "LABEL " lambda_name # Process parameters + # AWK FEATURE: split(string, array, separator) splits string into array elements + # Unlike JS string.split() which returns an array, this populates an existing array nparams = split(params, param_array, " ") for (i = 1; i <= nparams; i++) { print "STORE " param_array[i] @@ -537,10 +604,9 @@ function compile_lambda(args, params, body, param_array, nparams, i, lambda_name } print "RETURN" - # Only push the function name if we're not in a direct call - if (!(args ~ /^\([^)]*\)[^(]*$/)) { - print "PUSH_CONST S:" lambda_name - } + # Create closure that captures current environment + print "CAPTURE_ENV " lambda_name + print "PUSH_CONST CLOSURE:" lambda_name ":ENV_ID" } # Main expression compiler - dispatches based on expression type diff --git a/awk/scheme/scheme/bin/repl b/awk/scheme/scheme/bin/repl index 14a10cf..1fce388 100755 --- a/awk/scheme/scheme/bin/repl +++ b/awk/scheme/scheme/bin/repl @@ -1,15 +1,21 @@ #!/bin/bash # Enable debug tracing -DEBUG=0 +# BASH FEATURE: ${VAR:-default} provides a default value if VAR is unset or empty +# Unlike JS where you'd use VAR || default, this only uses default if VAR is literally unset +DEBUG=${DEBUG:-0} debug() { if [ "$DEBUG" = "1" ]; then + # BASH FEATURE: >&2 redirects output to stderr (file descriptor 2) + # Unlike JS console.error(), this explicitly redirects to stderr echo "[DEBUG] $*" >&2 fi } # Find the directory containing this script and the components +# BASH FEATURE: ${BASH_SOURCE[0]} is the path to the current script +# Unlike JS __filename, this works even when the script is sourced DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" COMPILER="$DIR/compiler.awk" VM="$DIR/vm.awk" @@ -35,10 +41,11 @@ STATE_FILE="/tmp/scheme_vm.state" cleanup() { debug "Cleaning up temp dir: $TMPDIR" rm -rf "$TMPDIR" - if [ "$1" != "keep_state" ]; then - rm -f "$STATE_FILE" - rm -f "/tmp/scheme_vm.env" - fi + # Temporarily disable state file deletion for debugging + # if [ "$1" != "keep_state" ]; then + # rm -f "$STATE_FILE" + # rm -f "/tmp/scheme_vm.env" + # fi } trap "cleanup" EXIT @@ -48,10 +55,11 @@ ASM_FILE="$TMPDIR/output.asm" DEBUG_FILE="$TMPDIR/debug.out" # Initialize/clear state files at REPL start -if [ "$#" -eq 0 ]; then # Only for interactive mode - : > "/tmp/scheme_vm.state" - : > "/tmp/scheme_vm.env" -fi +# Temporarily disable state file clearing for debugging +# if [ "$#" -eq 0 ]; then # Only for interactive mode +# : > "/tmp/scheme_vm.state" +# : > "/tmp/scheme_vm.env" +# fi # Function to handle evaluation evaluate_expression() { @@ -77,8 +85,10 @@ evaluate_expression() { cat "$ASM_FILE" >&2 debug "Running VM..." - # Use persistent VM state - result=$(awk -v PERSIST=1 -f "$VM" "$ASM_FILE" 2>&1) + # Use persistent VM state and pass debug flag + # BASH FEATURE: -v var=value passes variables to awk + # Unlike JS where you'd use process.env, this sets awk variables directly + result=$(awk -v PERSIST=1 -v DEBUG="$DEBUG" -f "$VM" "$ASM_FILE" 2>&1) debug "VM output: $result" if [ -n "$result" ]; then echo "$result" @@ -130,8 +140,12 @@ while true; do fi # Count parentheses + # BASH FEATURE: $(command) is command substitution - runs command and captures output + # Unlike JS where you'd use require('child_process').execSync(), this is built-in open_parens=$(echo "$line" | tr -cd '(' | wc -c) close_parens=$(echo "$line" | tr -cd ')' | wc -c) + # BASH FEATURE: $((expression)) is arithmetic expansion + # Unlike JS where you'd use eval() or a math library, this evaluates arithmetic expressions paren_count=$((paren_count + open_parens - close_parens)) if [ -n "$current_input" ]; then diff --git a/awk/scheme/scheme/bin/vm.awk b/awk/scheme/scheme/bin/vm.awk index 4e7d2c7..ea15884 100755 --- a/awk/scheme/scheme/bin/vm.awk +++ b/awk/scheme/scheme/bin/vm.awk @@ -15,18 +15,31 @@ BEGIN { T_PAIR = "P" # Cons cells (pairs) T_FUNCTION = "F" # Function references T_NIL = "NIL" # Empty list marker + T_CLOSURE = "CLOSURE" # Closure objects (function + captured environment) # Virtual machine registers stack_ptr = 0 # Points to top of evaluation stack heap_ptr = 0 # Points to next free heap location pc = 0 # Program counter for instruction fetch - # Debug mode disabled by default, can be enabled via DEBUG=1 environment variable + # Original program storage for nested function definitions + # AWK FEATURE: delete array removes all elements from an array + # Unlike JS where you'd set array = [], this clears the array in place + delete original_program # Stores the original program before function calls + + # AWK FEATURE: ENVIRON is a built-in array containing environment variables DEBUG = (ENVIRON["DEBUG"] == "1") ? 1 : 0 # Environment for variable bindings env_size = 0 # Current size of environment stack + # Closure support - captured environments for nested lambdas + delete closure_envs # Maps env_id to captured environment arrays + delete closure_env_names # Variable names in captured environments + delete closure_env_vals # Variable values in captured environments + delete closure_env_sizes # Size of each captured environment + next_env_id = 1 # Counter for generating unique environment IDs + # Function table for storing defined functions delete func_def_names # Function names delete func_def_pc # Entry points @@ -36,29 +49,57 @@ BEGIN { # Call stack for function returns call_stack_ptr = 0 + # Global function registry - clear it first + delete FUNCTIONS # Maps function names to implementations + # State persistence configuration STATE_FILE = "/tmp/scheme_vm.state" + debug("STATE_FILE_PATH: " STATE_FILE) + debug("PERSIST_FLAG: " PERSIST) if (PERSIST) { debug("Loading state from: " STATE_FILE) + debug("LOADING_STATE: Attempting to read " STATE_FILE) + debug("LOADING_STATE: FUNCTIONS table size before loading: " length(FUNCTIONS)) + # AWK FEATURE: getline is awk's file reading function + # getline var < file reads one line from file into var, returns 1 on success, 0 on EOF, -1 on error + # Unlike JS where you'd use fs.readFileSync(), this reads line by line if ((getline line < STATE_FILE) >= 0) { # Check if file exists and is readable + debug("LOADING_STATE: File opened successfully, first line: " line) + # AWK FEATURE: do-while loop syntax - the condition is checked at the end do { + debug("LOADING_STATE: Processing line: " line) if (line ~ /^FUNC /) { # Parse and load function definition + # AWK FEATURE: sub() modifies the string in place and returns count of replacements sub(/^FUNC /, "", line) name = line sub(/ .*$/, "", name) code = line sub(/^[^ ]+ /, "", code) + # Read the rest of the function code (until next FUNC or end of file) + debug("LOADING_STATE: Reading function code for " name) + while ((getline line < STATE_FILE) > 0) { + debug("LOADING_STATE: Reading line: " line) + if (line ~ /^FUNC /) { + # Found next function, put the line back + debug("LOADING_STATE: Found next function, stopping") + break + } + code = code "\n" line + } + debug("Loaded function: " name) debug("Code: " code) - # Store function in function table - func_def_names[func_def_size] = name - func_def_code[func_def_size] = code - func_def_size++ + # Store function in FUNCTIONS table + FUNCTIONS[name] = code + debug("LOADED_FUNCTION: " name " with code length " length(code)) + debug("LOADED_FUNCTION: FUNCTIONS table size after loading " name ": " length(FUNCTIONS)) + debug("LOADED_FUNCTION: Checking if " name " is in table: " (name in FUNCTIONS)) } } while ((getline line < STATE_FILE) > 0) + # AWK FEATURE: close() closes a file handle close(STATE_FILE) } } @@ -68,8 +109,17 @@ BEGIN { delete func_env_vals # Variable values in function scope delete func_env_sizes # Size of each function's environment - # Global function registry - delete FUNCTIONS # Maps function names to implementations + # Register built-in functions first + debug("REGISTERING_BUILTINS: " length(FUNCTIONS) " functions before") + FUNCTIONS["+"] = "add" + FUNCTIONS["-"] = "subtract" + FUNCTIONS["*"] = "multiply" + FUNCTIONS["/"] = "divide" + FUNCTIONS["="] = "equals" + FUNCTIONS["<"] = "less_than" + FUNCTIONS[">"] = "greater_than" + FUNCTIONS["add1"] = "add_one" + debug("REGISTERING_BUILTINS: " length(FUNCTIONS) " functions after") # Environment persistence configuration ENV_STATE_FILE = "/tmp/scheme_vm.env" @@ -97,16 +147,6 @@ BEGIN { } } - # Register built-in functions - FUNCTIONS["+"] = "add" - FUNCTIONS["-"] = "subtract" - FUNCTIONS["*"] = "multiply" - FUNCTIONS["/"] = "divide" - FUNCTIONS["="] = "equals" - FUNCTIONS["<"] = "less_than" - FUNCTIONS[">"] = "greater_than" - FUNCTIONS["add1"] = "add_one" - # Track if VM halted normally (vs error) normal_exit = 0 } @@ -123,12 +163,17 @@ function makeValue(type, val) { } function getType(val) { + # AWK FEATURE: substr(string, start, length) extracts substring + # Unlike JS string.slice(), this is 1-indexed and requires explicit length + # AWK FEATURE: index(string, substring) returns position of substring (1-indexed) + # Unlike JS string.indexOf(), this returns 0 if not found and is 1-indexed type = substr(val, 1, index(val, ":") - 1) debug("Get type: " type " from " val) return type } function getValue(val) { + # AWK FEATURE: index() returns 1-indexed position, so we add 1 to get after the colon value = substr(val, index(val, ":") + 1) debug("Get value: " value " from " val) return value @@ -141,6 +186,77 @@ function isSymbol(val) { return getType(val) == T_SYMBOL } function isPair(val) { return getType(val) == T_PAIR } function isFunction(val) { return getType(val) == T_FUNCTION } function isNil(val) { return getType(val) == T_NIL } +function isClosure(val) { return getType(val) == T_CLOSURE } + +# Closure helper functions +function makeClosure(func_name, env_id) { + return T_CLOSURE ":" func_name ":" env_id +} + +function getClosureFunction(closure_val) { + # Extract function name from CLOSURE:func_name:env_id + return substr(closure_val, index(closure_val, ":") + 1, + index(substr(closure_val, index(closure_val, ":") + 1), ":") - 1) +} + +function getClosureEnvId(closure_val) { + # Extract env_id from CLOSURE:func_name:env_id + first_colon = index(closure_val, ":") + second_colon = index(substr(closure_val, first_colon + 1), ":") + first_colon + return substr(closure_val, second_colon + 1) +} + +# Environment capture for closures +function captureEnvironment(env_id, i) { + debug("Capturing environment with ID: " env_id) + closure_env_sizes[env_id] = env_size + + # Copy current environment to closure environment + for (i = 0; i < env_size; i++) { + # Only capture non-global variables + if (env_name[i] !~ /^__global_/) { + closure_env_names[env_id, i] = env_name[i] + closure_env_vals[env_id, i] = env_val[i] + debug("Captured: " env_name[i] " = " env_val[i]) + } + } + + debug("Captured environment size: " closure_env_sizes[env_id]) +} + +# VM instruction to capture environment +function vm_capture_env(func_name) { + debug("Capturing environment for function: " func_name) + env_id = next_env_id++ + captureEnvironment(env_id) + + # Replace the placeholder ENV_ID in the closure value + # Find the closure value on the stack and update it + if (stack_ptr > 0) { + closure_val = stack[stack_ptr] + if (closure_val ~ /^CLOSURE:/) { + # Replace ENV_ID with actual env_id + new_closure_val = "CLOSURE:" func_name ":" env_id + stack[stack_ptr] = new_closure_val + debug("Updated closure value: " new_closure_val) + } + } +} + +# Environment restoration for closures +function pushClosureEnvironment(env_id, i) { + debug("Pushing closure environment: " env_id) + if (env_id in closure_env_sizes) { + for (i = 0; i < closure_env_sizes[env_id]; i++) { + if ((env_id, i) in closure_env_names) { + env_name[env_size] = closure_env_names[env_id, i] + env_val[env_size] = closure_env_vals[env_id, i] + env_size++ + debug("Restored: " closure_env_names[env_id, i] " = " closure_env_vals[env_id, i]) + } + } + } +} # Stack operations function push(val) { @@ -279,6 +395,8 @@ function vm_less_than() { # Main instruction execution loop function execute(instr) { + # AWK FEATURE: split(string, array, separator) splits string into array elements + # Unlike JS string.split() which returns an array, this populates an existing array split(instr, parts, " ") op = parts[1] debug("Execute: " instr) @@ -298,6 +416,7 @@ function execute(instr) { if (stack_ptr < 2) error("SWAP requires two operands") val2 = pop() val1 = pop() + val1 = pop() push(val2) push(val1) } @@ -333,14 +452,20 @@ function execute(instr) { print peek() } else if (op == "HALT") { + debug("EXECUTING_HALT") normal_exit = 1 + debug("HALT: Stack pointer = " stack_ptr) if (stack_ptr > 0) { result = peek() + debug("HALT: Result = " result) + } else { + debug("HALT: Stack is empty") } if (PERSIST) { save_state() } if (result) { + debug("PRINTING_RESULT: " result) print result } exit(0) @@ -358,27 +483,40 @@ function execute(instr) { vm_define_function(parts[2], pc) } else if (op == "CALL") { - vm_call_function(parts[2]) + vm_call_function() } else if (op == "RETURN") { - vm_return() + debug("EXECUTING_RETURN") + # The call_stack_ptr is no longer used for return, so this instruction is effectively removed. + # The function execution itself handles the return. } else if (op == "GET_VALUE") { vm_get_value() } + else if (op == "CAPTURE_ENV") { + vm_capture_env(parts[2]) + } else { error("Unknown instruction: " op) } } # Load program instructions +# AWK FEATURE: Each line of input is automatically processed by this block +# NR is a built-in variable that contains the current record (line) number +# Unlike JS where you'd need to track line numbers manually { - program[NR-1] = $0 + program[NR-1] = $0 # $0 is the current line being processed + original_program[NR-1] = $0 # Keep a copy of the original program } -# Main execution loop +# AWK FEATURE: END block runs after all input has been processed +# This is like a "finally" block that always executes after reading all input END { + # AWK FEATURE: length(array) returns the number of elements in an array + # Unlike JS array.length, this is a function call, not a property while (pc < length(program)) { + # debug("EXECUTING_PC_" pc ": " program[pc]) execute(program[pc++]) } @@ -421,6 +559,8 @@ function vm_store(name) { val = peek() if (isSymbol(val)) { func_name = getValue(val) + # AWK FEATURE: ~ is the regex match operator (like /pattern/.test() in JS) + # The pattern is a regex literal, not a string if (func_name ~ /^__lambda_/) { # Store the function code under the new name FUNCTIONS[name] = FUNCTIONS[func_name] @@ -460,61 +600,161 @@ function vm_pop_env() { function vm_lookup(name, i, global_name, val) { debug("Looking up " name " in environment of size: " env_size) dump_env() - - # Check if it's a function (built-in or user-defined) - if (name in FUNCTIONS) { - debug("Found function: " name) - push(makeValue(T_SYMBOL, name)) - return - } - + # Try global name first, then local global_name = "__global_" name + debug("LOOKUP_ENV: Looking for " name " in environment of size " env_size) for (i = env_size - 1; i >= 0; i--) { + debug("LOOKUP_ENV: Checking index " i ": " env_name[i] " = " env_val[i]) if (env_name[i] == global_name || env_name[i] == name) { debug("Found " name " = " env_val[i] " at position " i) push(env_val[i]) return } } + + # Check if it's a function (built-in or user-defined) + debug("LOOKUP_CHECKING: " name " in FUNCTIONS table") + debug("FUNCTIONS_TABLE_SIZE: " length(FUNCTIONS)) + debug("FUNCTIONS_IN_TABLE:") + # AWK FEATURE: for (var in array) iterates over array keys + # Unlike JS for...in which includes inherited properties, awk arrays don't have inheritance + for (f in FUNCTIONS) { + debug(" " f) + } + # AWK FEATURE: 'in' operator checks if key exists in array + # Unlike JS where you'd use array.hasOwnProperty(key) or 'key' in array + if (name in FUNCTIONS) { + debug("Found function: " name) + push(makeValue(T_SYMBOL, name)) + return + } + debug("FUNCTION_LOOKUP_FAILED: " name) error("Undefined variable: " name) } # Function definition implementation function vm_define_function(name, start_pc) { debug("Defining function: " name " at " start_pc) + debug("call_stack_ptr: " call_stack_ptr) + + # Clear the from_define flag + for (i = env_size - 1; i >= 0; i--) { + if (env_name[i] == "from_define") { + debug("Clearing from_define flag") + env_size-- + break + } + } # Build function code code = "" i = start_pc - while (i < length(program) && program[i] != "RETURN") { - if (code != "") code = code "\n" - code = code program[i] - i++ + + # If we're inside a function call, we need to find the corresponding position + # in the original program. For now, use a simple heuristic. + if (call_stack_ptr > 0) { + debug("Nested function definition - using current instruction") + # Just read from the current program position + # AWK FEATURE: length(array) returns the number of elements in an array + # Unlike JS array.length, this is a function call, not a property + while (i < length(program) && program[i] != "RETURN") { + if (code != "") code = code "\n" + code = code program[i] + i++ + } + } else { + debug("Top-level function definition - using original program") + # Use original_program for top-level function definitions + # AWK FEATURE: length(array) returns the number of elements in an array + # Unlike JS array.length, this is a function call, not a property + while (i < length(original_program) && original_program[i] != "RETURN") { + if (code != "") code = code "\n" + code = code original_program[i] + i++ + } } code = code "\nRETURN" # Store function debug("Storing function: " name " = " code) FUNCTIONS[name] = code + debug("Function stored. Total functions: " length(FUNCTIONS)) + debug("FUNCTION_STORED: " name) pc = i + 1 } # Function call implementation -function vm_call_function(func_name, code_lines, j, saved_pc, saved_env_size, arg, param_name) { +function vm_call_function(code_lines, j, saved_pc, saved_env_size, arg, param_name) { + # Get function name from stack + func_name = pop() debug("Calling function: " func_name) + debug("CALLING_FUNCTION: " func_name) # If name is a symbol, get its value if (isSymbol(func_name)) { func_name = getValue(func_name) } - # Handle anonymous functions + # Check if it's a built-in function first + if (func_name in FUNCTIONS) { + built_in_name = FUNCTIONS[func_name] + if (built_in_name == "add") { + debug("Calling built-in function: add") + add() + return + } else if (built_in_name == "subtract") { + debug("Calling built-in function: subtract") + subtract() + return + } else if (built_in_name == "multiply") { + debug("Calling built-in function: multiply") + multiply() + return + } else if (built_in_name == "divide") { + debug("Calling built-in function: divide") + divide() + return + } else if (built_in_name == "equals") { + debug("Calling built-in function: equals") + equals() + return + } else if (built_in_name == "less_than") { + debug("Calling built-in function: less_than") + less_than() + return + } else if (built_in_name == "greater_than") { + debug("Calling built-in function: greater_than") + greater_than() + return + } else if (built_in_name == "add_one") { + debug("Calling built-in function: add_one") + add_one() + return + } + } + + # Handle anonymous functions and closures if (func_name ~ /^__lambda_/) { if (!(func_name in FUNCTIONS)) { error("Undefined lambda function: " func_name) } + } else if (isClosure(func_name)) { + # This is a closure call + debug("Calling closure: " func_name) + closure_func = getClosureFunction(func_name) + closure_env_id = getClosureEnvId(func_name) + + if (!(closure_func in FUNCTIONS)) { + error("Undefined closure function: " closure_func) + } + + # Restore the captured environment + pushClosureEnvironment(closure_env_id) + + # Use the closure's function name for the rest of the call + func_name = closure_func } else if (!(func_name in FUNCTIONS)) { error("Undefined function: " func_name) } @@ -522,69 +762,100 @@ function vm_call_function(func_name, code_lines, j, saved_pc, saved_env_size, ar saved_pc = pc saved_env_size = env_size - # Split function code into lines + # AWK FEATURE: split(string, array, separator) splits string into array elements + # Unlike JS string.split() which returns an array, this populates an existing array split(FUNCTIONS[func_name], code_lines, "\n") - # Add function code to program at current position - for (j in code_lines) { - program[pc + j - 1] = code_lines[j] - } - # Check if this is a parameterized function if (code_lines[1] ~ /^STORE /) { # This is a parameterized function (lambda) # Get parameter name from STORE instruction param_name = substr(code_lines[1], 7) debug("Found parameter name: " param_name) + debug("FUNCTION_PARAM: " param_name) # Get argument from stack arg = pop() debug("Function argument: " arg) + debug("FUNCTION_ARG: " arg) # Create new environment frame debug("Creating new environment frame at size: " env_size) env_name[env_size] = param_name env_val[env_size] = arg env_size++ + debug("FUNCTION_ENV_STORE: " param_name " = " arg " at index " (env_size-1)) + + # Execute function code directly, skipping STORE and POP_ENV instructions + for (j = 2; j <= length(code_lines); j++) { + if (code_lines[j] != "" && code_lines[j] != "POP_ENV") { + debug("Executing function instruction: " code_lines[j]) + execute(code_lines[j]) + } + } + + # Clean up parameter + vm_pop_env() + + # Return to caller + debug("Function completed, returning to PC: " saved_pc) + pc = saved_pc + return } else { # This is a built-in function or non-parameterized function debug("Calling non-parameterized function: " func_name) - } - - # Save return info and jump to function - call_stack[++call_stack_ptr] = saved_pc - env_stack[call_stack_ptr] = saved_env_size - - debug("Function found, jumping to PC: " pc " with env_size: " saved_env_size) - dump_env() -} - -# Function return implementation -function vm_return() { - if (call_stack_ptr > 0) { - # Save return value - ret_val = pop() - # Restore environment - while (env_size > env_stack[call_stack_ptr]) { - debug("Popping environment at size: " env_size) - vm_pop_env() + # Execute all function code directly + for (j in code_lines) { + if (code_lines[j] != "") { + debug("Executing function instruction: " code_lines[j]) + execute(code_lines[j]) + } } - # Restore program counter - pc = call_stack[call_stack_ptr--] - - # Push return value - push(ret_val) - - debug("Returned with value: " ret_val " and env_size: " env_size) + # Return to caller + debug("Function completed, returning to PC: " saved_pc) + pc = saved_pc + return } } +# Function return implementation - no longer needed with direct execution +# function vm_return() { +# debug("VM_RETURN: call_stack_ptr = " call_stack_ptr) +# if (call_stack_ptr > 0) { +# # Save return value +# ret_val = pop() +# debug("VM_RETURN: return value = " ret_val) +# +# # Restore environment +# while (env_size > env_stack[call_stack_ptr]) { +# debug("Popping environment at size: " env_size) +# vm_pop_env() +# } +# +# # Restore program counter +# pc = call_stack[call_stack_ptr--] +# debug("VM_RETURN: restored PC = " pc) +# +# # Restore the original program at the call position +# program[pc] = original_program_at_call[call_stack_ptr + 1] +# debug("Restored original program: " original_program_at_call[call_stack_ptr + 1]) +# +# # Push return value +# push(ret_val) +# debug("VM_RETURN: pushed return value " ret_val) +# +# debug("Returned with value: " ret_val " and env_size: " env_size) +# } +# } + # Debug helper to dump environment contents function dump_env( i) { debug("Environment dump:") for (i = 0; i < env_size; i++) { + # AWK FEATURE: sprintf() formats a string like printf but returns it instead of printing + # Unlike JS where you'd use template literals or String.format(), this is the awk way debug(sprintf(" %d: %s = %s", i, env_name[i], env_val[i])) } } @@ -601,11 +872,22 @@ function lookup_no_error(name, i) { # State persistence implementation function save_state() { + debug("SAVE_STATE_CALLED") debug("Saving state to: " STATE_FILE) - for (i = 0; i < func_def_size; i++) { - debug("Saving function: " func_def_names[i]) - print "FUNC " func_def_names[i] " " func_def_code[i] > STATE_FILE + # Save functions from FUNCTIONS table + debug("SAVE_STATE: Saving " length(FUNCTIONS) " functions") + for (func_name in FUNCTIONS) { + if (func_name !~ /^__lambda_/) { # Don't save lambda functions + debug("Saving function: " func_name) + debug("SAVE_STATE: About to write function " func_name) + debug("SAVE_STATE: Function code length: " length(FUNCTIONS[func_name])) + # AWK FEATURE: printf with > file redirects output to a file + # Unlike JS where you'd use fs.writeFileSync(), this redirects from stdout to file + printf "FUNC %s %s\n", func_name, FUNCTIONS[func_name] > STATE_FILE + debug("SAVE_STATE: Saved function " func_name " to " STATE_FILE) + } } + # AWK FEATURE: close() closes a file handle close(STATE_FILE) # Save environment state @@ -613,6 +895,8 @@ function save_state() { for (i = 0; i < env_size; i++) { if (env_name[i] ~ /^__global_/) { # Only save globals debug("Saving env var: " env_name[i] " = " env_val[i]) + # AWK FEATURE: print with > file redirects output to a file + # Unlike JS console.log() which always goes to stdout print "ENV " env_name[i] " " env_val[i] > ENV_STATE_FILE } } @@ -694,7 +978,7 @@ function add_one() { # Get value from top of stack function vm_get_value() { - val = peek() + val = pop() if (isSymbol(val)) { name = getValue(val) # If it's a function name, just push the name directly @@ -703,5 +987,11 @@ function vm_get_value() { } else { push(makeValue(T_SYMBOL, name)) } + } else if (isClosure(val)) { + # For closures, just push the closure value back + push(val) + } else { + # For other types, just push the value back + push(val) } } \ No newline at end of file diff --git a/awk/scheme/scheme/diagram.md b/awk/scheme/scheme/diagram.md index 4a719b4..689a44a 100644 --- a/awk/scheme/scheme/diagram.md +++ b/awk/scheme/scheme/diagram.md @@ -5,11 +5,12 @@ +----------------+ Scheme Code +----------------+ Assembly +----------------+ | | -----------------> | | -------------> | | | REPL | "(+ 1 2)" | Compiler | "PUSH_CONST | VM | -| (bin/repl) | | compiler.awk | N:1 | vm.awk | +| (bin/repl) | "(lambda (x) ...)" | compiler.awk | N:1 | vm.awk | | | | | PUSH_CONST | | | - Multi-line | | - Lexer | N:2 | - Stack-based | | - Debug mode | | - Parser | ADD | - Type system | | - File input | | - Code gen | HALT" | - Environment | +| | | - Closure gen | CAPTURE_ENV | - Direct exec | | | <----------------- | | <------------- | | | | Output: "N:3" | | Result | | +----------------+ +----------------+ +----------------+ @@ -27,7 +28,7 @@ Debug Flow (when DEBUG=1): | REPL | -----------------> | Compiler | -------------> | VM | | | [Input received] | | [Tokens found] | | | [Debug output] | | [Parsing tree] | [Stack ops] | [Stack state] | -| stderr | <----------------- | [Gen code] | <------------- | [Environment] | +| stderr | <----------------- | [Gen code] | [Closure info] | [Environment] | +----------------+ +----------------+ +----------------+ Execution Flow Example: @@ -40,10 +41,51 @@ Execution Flow Example: │ │ │ │ │ ADD │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ +Closure Flow Example: +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Input: │ │ Lexer/ │ │ Code Gen │ │ VM │ +│(let ((x 10))│ ------->│ Parser: │ ------> │ STORE x │ ------> │ Env: [x=10] │ +│ ((lambda │ │ (let │ │ LABEL λ │ │ │ +│ (y) (+ x │ │ ((x 10)) │ │ CAPTURE_ENV │ │ Closure: │ +│ y)) 32))) │ │ (lambda │ │ PUSH_CONST │ │ λ+env_id │ +│ │ │ (y) ...) │ │ CLOSURE:... │ │ │ +│ │ │ │ │ CALL │ │ Result: 42 │ +└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ + State Management: -┌─────────────┐ ┌─────────────┐ ┌─────────────┐ -│ Global │ │ Environment │ │ Function │ -│ Variables │ ------> │ Stack │ ------> │ Calls │ -│ (persist) │ │ (frames) │ │ (stack) │ -└─────────────┘ └─────────────┘ └─────────────┘ -``` \ No newline at end of file +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Global │ │ Environment │ │ Function │ │ Closure │ +│ Variables │ ------> │ Stack │ ------> │ Calls │ ------> │ Environment │ +│ (persist) │ │ (frames) │ │ (direct) │ │ (captured) │ +└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ + +Data Type System: +┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ +│ Numbers │ │ Booleans │ │ Symbols │ │ Closures │ +│ N:value │ │ B:value │ │ S:name │ │CLOSURE:fn:env_id│ +└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ + │ │ │ │ + └───────────────────┼───────────────────┼───────────────────┘ + │ │ + ┌─────────────┐ ┌─────────────┐ + │ Pairs │ │ Nil │ + │ P:index │ │ NIL: │ + └─────────────┘ └─────────────┘ + +VM Instruction Set: +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ Stack Ops │ │ Environment │ │ Functions │ +│ PUSH_CONST val │ │ STORE name │ │ LABEL name │ +│ POP │ │ LOOKUP name │ │ CALL │ +│ ADD/SUB/MUL/DIV │ │ POP_ENV │ │ RETURN │ +│ GET_VALUE │ │ │ │ │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + │ │ │ + └───────────────────────┼───────────────────────┘ + │ + ┌─────────────────┐ + │ Closures │ + │ CAPTURE_ENV fn │ + │ PUSH_CONST │ + │ CLOSURE:fn:env │ + └─────────────────┘ \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/arch-notes.md b/awk/scheme/scheme/scratch/arch-notes.md new file mode 100644 index 0000000..060ca72 --- /dev/null +++ b/awk/scheme/scheme/scratch/arch-notes.md @@ -0,0 +1,115 @@ +# Awk-Scheme: Architectural Notes + +## Overview +Awk-Scheme is a minimal Scheme interpreter implemented in AWK, composed of a compiler and a stack-based virtual machine (VM). The architecture is modular, with clear separation of concerns between parsing/compilation and execution. The design leverages several classic architectural patterns to achieve extensibility, maintainability, and clarity. + +--- + +## 1. Program Flow: High-Level Pipeline + +1. **Input**: User provides Scheme code (via REPL or file). +2. **Compilation**: The compiler (`bin/compiler.awk`) parses and compiles Scheme code into VM instructions. +3. **Execution**: The VM (`bin/vm.awk`) executes the instructions, managing state, environment, and memory. + +--- + +## 2. Compiler (`bin/compiler.awk`) + +### 2.1. Lexical Analysis (Lexer) +- **Pattern**: *Lexer/Parser Separation* (classic compiler front-end) +- **Why**: Decouples tokenization from parsing, making the code easier to reason about and extend. +- **How**: The compiler tokenizes input into numbers, symbols, and parentheses, handling whitespace and comments. + +### 2.2. Parsing (Recursive Descent) +- **Pattern**: *Recursive Descent Parser* +- **Why**: Simple, direct mapping from grammar to code; easy to debug and extend for a small language. +- **How**: The parser builds an expression tree from tokens, handling nested expressions and validating syntax. + +### 2.3. Code Generation +- **Pattern**: *Visitor/Dispatcher* (for expression types) +- **Why**: Each expression type (number, variable, list, special form) is handled by a dedicated function, improving maintainability. +- **How**: The compiler emits stack-based VM instructions for each expression, handling special forms (define, let, lambda) and function calls. + +### 2.4. Closure Support +- **Pattern**: *Closure Conversion* (from functional programming) +- **Why**: Enables lexical scoping and first-class functions by capturing the environment at lambda creation. +- **How**: The compiler emits instructions to capture the current environment and create closure objects. + +--- + +## 3. Virtual Machine (`bin/vm.awk`) + +### 3.1. Stack-Based Execution +- **Pattern**: *Stack Machine* (classic VM design) +- **Why**: Simplicity and direct mapping from compiled instructions to execution; easy to implement in AWK. +- **How**: The VM maintains an evaluation stack for operands and results, executing instructions sequentially. + +### 3.2. Typed Value System +- **Pattern**: *Tagged Union* (Algebraic Data Type) +- **Why**: Enables runtime type checking and safe operations on heterogeneous values. +- **How**: All values are tagged (e.g., `N:`, `B:`, `P:`, `F:`, `CLOSURE:`) and checked at runtime. + +### 3.3. Environment Model +- **Pattern**: *Environment Chain* (Lexical Scoping) +- **Why**: Supports variable bindings, lexical scope, and closures. +- **How**: The VM maintains an environment stack for variable bindings, with special handling for global and closure environments. + +### 3.4. Function and Closure Handling +- **Pattern**: *Direct Function Execution* (no program array modification) +- **Why**: Simplifies call/return logic and avoids mutation of the instruction stream. +- **How**: Functions are stored as code blocks; calls execute code directly, with environment management for parameters and closures. + +### 3.5. Heap and Memory Management +- **Pattern**: *Manual Heap with Reference Counting (partial)* +- **Why**: Enables cons cell allocation and basic memory management. +- **How**: The VM allocates cons cells on a heap array, with a placeholder for reference counting (not fully implemented). + +### 3.6. State Persistence +- **Pattern**: *State Serialization* +- **Why**: Allows global state and functions to persist across REPL sessions. +- **How**: The VM serializes global variables and function definitions to files, loading them on startup. + +--- + +## 4. Extensibility and Maintainability +- **Pattern**: *Separation of Concerns* +- **Why**: Compiler and VM are independent, making it easy to extend the language or change the execution model. +- **Pattern**: *Table-Driven Dispatch* (for built-in functions) +- **Why**: Adding new primitives or special forms is straightforward. + +--- + +## 5. Notable Limitations +- No support for nested lambdas (yet), proper tail recursion, or garbage collection. +- Reference counting is stubbed but not enforced. +- Error handling is minimal. + +--- + +## 6. Summary Table: Patterns Used + +| Area | Pattern(s) Used | Why? | +|---------------------|----------------------------------|--------------------------------------| +| Lexing/Parsing | Lexer/Parser, Recursive Descent | Simplicity, extensibility | +| Code Generation | Visitor/Dispatcher | Maintainability, clarity | +| VM Execution | Stack Machine, Tagged Union | Simplicity, type safety | +| Environment | Environment Chain, Closure Conv. | Lexical scoping, closures | +| Function Dispatch | Table-Driven Dispatch | Extensibility | +| State Persistence | State Serialization | REPL continuity | +| Memory Management | Manual Heap, Ref Counting (stub) | List support, future GC | + +--- + +## 7. Architectural Choices: Rationale +- **AWK as Implementation Language**: Chosen for portability and as a challenge; influences the use of arrays and string-based data structures. +- **Stack Machine**: Maps well to AWK's capabilities and keeps the VM simple. +- **Separation of Compiler/VM**: Enables clear boundaries and easier debugging. +- **Explicit Typing**: Reduces runtime errors and clarifies value handling. + +--- + +## 8. Flow Diagram (Textual) + +``` +User Input (Scheme) → [Compiler] → VM Instructions → [VM] → Result/State +``` \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/ideas-for-string-support.md b/awk/scheme/scheme/scratch/ideas-for-string-support.md new file mode 100644 index 0000000..0d81e5f --- /dev/null +++ b/awk/scheme/scheme/scratch/ideas-for-string-support.md @@ -0,0 +1,556 @@ +# String Support Implementation Plan for Awk-Scheme + +## Overview +This document outlines a comprehensive plan for adding string support to the Awk-Scheme interpreter. The implementation will follow the existing architecture patterns and maintain consistency with the current type system and execution model. + +## Current Architecture Analysis + +### Type System +- Current types: `N:`, `B:`, `S:`, `P:`, `F:`, `NIL:`, `CLOSURE:` +- All values use `type:value` format for runtime type checking +- Type checking via `getType()` and `getValue()` functions +- Type predicates: `isNumber()`, `isBoolean()`, `isSymbol()`, etc. + +### Compiler Pipeline +1. **Lexer** (`next_token()`): Handles numbers, symbols, parentheses +2. **Parser** (`parse_expr()`, `parse_list()`): Builds expression trees +3. **Code Generator**: Emits VM instructions for different expression types +4. **Expression Compiler** (`compile_expr()`): Dispatches based on expression type + +### VM Architecture +- Stack-based execution with typed values +- Environment-based variable binding +- Built-in function registry (`FUNCTIONS` table) +- Direct function execution (no program array modification) +- State persistence for globals and functions + +## String Implementation Plan + +### Phase 1: Core String Type System + +#### 1.1 VM Type System Extensions +**File: `bin/vm.awk`** + +```awk +# Add to BEGIN block +T_STRING = "STR" # String type tag + +# Add type predicate +function isString(val) { return getType(val) == T_STRING } + +# String value constructor +function makeString(val) { + return T_STRING ":" val +} +``` + +**Changes Required:** +- Add `T_STRING` constant in BEGIN block +- Add `isString()` predicate function +- Add `makeString()` constructor function +- Update type checking in existing operations where needed + +#### 1.2 String Storage Strategy +**Options:** +1. **Inline storage**: Store strings directly in the type:value format + - Pros: Simple, fast access + - Cons: Limited by awk string length, no escaping issues + +2. **Heap storage**: Store strings in heap like cons cells + - Pros: Unlimited length, consistent with existing patterns + - Cons: More complex, requires string management + +**Recommendation**: Start with inline storage for simplicity, can migrate to heap later. + +### Phase 2: Compiler Extensions + +#### 2.1 Lexer String Support +**File: `bin/compiler.awk`** + +**Current lexer handles:** +- Numbers: `-?[0-9]+` +- Symbols: `[a-zA-Z_][a-zA-Z0-9_]*` +- Parentheses: `(`, `)` + +**New string tokenization:** +```awk +# Add to next_token() function +# Handle string literals (double quotes) +if (c == "\"") { + str = "" + input_buffer = substr(input_buffer, 2) # Skip opening quote + + while (length(input_buffer) > 0) { + c = substr(input_buffer, 1, 1) + if (c == "\"") { + input_buffer = substr(input_buffer, 2) # Skip closing quote + break + } + if (c == "\\") { + # Handle escape sequences + input_buffer = substr(input_buffer, 2) + if (length(input_buffer) > 0) { + c = substr(input_buffer, 1, 1) + if (c == "n") str = str "\n" + else if (c == "t") str = str "\t" + else if (c == "\\") str = str "\\" + else if (c == "\"") str = str "\"" + else error("Invalid escape sequence: \\" c) + input_buffer = substr(input_buffer, 2) + } + } else { + str = str c + input_buffer = substr(input_buffer, 2) + } + } + return "\"" str "\"" # Return with quotes for identification +} +``` + +**Escape Sequences to Support:** +- `\"` - Literal double quote +- `\\` - Literal backslash +- `\n` - Newline +- `\t` - Tab +- `\r` - Carriage return + +#### 2.2 Parser String Support +**File: `bin/compiler.awk`** + +**Update `parse_expr()`:** +```awk +function parse_expr(token, result) { + token = next_token() + if (token == "EOF") return "" + + if (token == "(") { + result = parse_list() + debug("Parsed list: " result) + return result + } + + # Handle string literals + if (substr(token, 1, 1) == "\"") { + debug("Parsed string: " token) + return token + } + + debug("Parsed token: " token) + return token +} +``` + +#### 2.3 Code Generation for Strings +**File: `bin/compiler.awk`** + +**Add to `compile_expr()`:** +```awk +function compile_expr(expr, split_result, op, args) { + debug("Compiling expression: " expr) + + # Handle string literals + if (substr(expr, 1, 1) == "\"") { + compile_string(expr) + return + } + + # ... existing code for numbers, nil, variables, etc. +} + +function compile_string(str) { + debug("Compiling string: " str) + # Remove outer quotes and emit string constant + content = substr(str, 2, length(str) - 2) + print "PUSH_CONST STR:" content +} +``` + +### Phase 3: VM String Operations + +#### 3.1 String Built-in Functions +**File: `bin/vm.awk`** + +**Add to BEGIN block (built-in registration):** +```awk +# String operations +FUNCTIONS["string-length"] = "string_length" +FUNCTIONS["string-append"] = "string_append" +FUNCTIONS["string-ref"] = "string_ref" +FUNCTIONS["substring"] = "substring" +FUNCTIONS["string=?"] = "string_equal" +FUNCTIONS["string<?"] = "string_less_than" +FUNCTIONS["string>?"] = "string_greater_than" +FUNCTIONS["string-ci=?"] = "string_ci_equal" +FUNCTIONS["string-ci<?"] = "string_ci_less_than" +FUNCTIONS["string-ci>?"] = "string_ci_greater_than" +``` + +**Add to function call dispatch in `vm_call_function()`:** +```awk +} else if (built_in_name == "string_length") { + debug("Calling built-in function: string_length") + string_length() + return +} else if (built_in_name == "string_append") { + debug("Calling built-in function: string_append") + string_append() + return +# ... etc for other string functions +``` + +#### 3.2 String Function Implementations +**File: `bin/vm.awk`** + +```awk +# String length +function string_length() { + if (stack_ptr < 1) error("string-length requires one operand") + val = pop() + if (!isString(val)) error("string-length requires string operand") + str = getValue(val) + push(makeValue(T_NUMBER, length(str))) +} + +# String concatenation +function string_append() { + if (stack_ptr < 2) error("string-append requires at least two operands") + result = "" + # Pop all arguments and concatenate + while (stack_ptr > 0) { + val = pop() + if (!isString(val)) error("string-append requires string operands") + result = getValue(val) result + } + push(makeString(result)) +} + +# String character access (0-indexed) +function string_ref() { + if (stack_ptr < 2) error("string-ref requires two operands") + index_val = pop() + str_val = pop() + if (!isNumber(index_val)) error("string-ref index must be a number") + if (!isString(str_val)) error("string-ref requires string operand") + + str = getValue(str_val) + idx = getValue(index_val) + 1 # Convert to 1-indexed + if (idx < 1 || idx > length(str)) error("string-ref index out of bounds") + + char = substr(str, idx, 1) + push(makeString(char)) +} + +# Substring extraction +function substring() { + if (stack_ptr < 3) error("substring requires three operands") + end = pop() + start = pop() + str_val = pop() + + if (!isNumber(start) || !isNumber(end)) error("substring indices must be numbers") + if (!isString(str_val)) error("substring requires string operand") + + str = getValue(str_val) + start_idx = getValue(start) + 1 # Convert to 1-indexed + end_idx = getValue(end) + 1 + + if (start_idx < 1 || end_idx > length(str) || start_idx > end_idx) { + error("substring indices out of bounds") + } + + result = substr(str, start_idx, end_idx - start_idx + 1) + push(makeString(result)) +} + +# String comparison (case-sensitive) +function string_equal() { + if (stack_ptr < 2) error("string=? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string=? requires string operands") + result = (getValue(val1) == getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_less_than() { + if (stack_ptr < 2) error("string<? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string<? requires string operands") + result = (getValue(val1) < getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_greater_than() { + if (stack_ptr < 2) error("string>? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string>? requires string operands") + result = (getValue(val1) > getValue(val2)) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +# Case-insensitive string comparison +function string_ci_equal() { + if (stack_ptr < 2) error("string-ci=? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string-ci=? requires string operands") + # Convert to lowercase for comparison + str1 = tolower(getValue(val1)) + str2 = tolower(getValue(val2)) + result = (str1 == str2) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_ci_less_than() { + if (stack_ptr < 2) error("string-ci<? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string-ci<? requires string operands") + str1 = tolower(getValue(val1)) + str2 = tolower(getValue(val2)) + result = (str1 < str2) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} + +function string_ci_greater_than() { + if (stack_ptr < 2) error("string-ci>? requires two operands") + val2 = pop() + val1 = pop() + if (!isString(val1) || !isString(val2)) error("string-ci>? requires string operands") + str1 = tolower(getValue(val1)) + str2 = tolower(getValue(val2)) + result = (str1 > str2) ? "1" : "0" + push(makeValue(T_BOOLEAN, result)) +} +``` + +### Phase 4: Enhanced String Operations + +#### 4.1 Additional String Functions +```awk +# String to number conversion +function string_to_number() { + if (stack_ptr < 1) error("string->number requires one operand") + val = pop() + if (!isString(val)) error("string->number requires string operand") + str = getValue(val) + if (str ~ /^-?[0-9]+$/) { + push(makeValue(T_NUMBER, str)) + } else { + push(makeValue(T_BOOLEAN, "0")) # Return false for invalid numbers + } +} + +# Number to string conversion +function number_to_string() { + if (stack_ptr < 1) error("number->string requires one operand") + val = pop() + if (!isNumber(val)) error("number->string requires number operand") + num = getValue(val) + push(makeString(num)) +} + +# String splitting +function string_split() { + if (stack_ptr < 2) error("string-split requires two operands") + delimiter = pop() + str_val = pop() + if (!isString(delimiter) || !isString(str_val)) { + error("string-split requires string operands") + } + + str = getValue(str_val) + delim = getValue(delimiter) + + # Use awk's split function + split(str, parts, delim) + result = "" + for (i = 1; i <= length(parts); i++) { + if (result != "") result = result " " + result = result "\"" parts[i] "\"" + } + push(makeString("(" result ")")) +} +``` + +#### 4.2 String Formatting +```awk +# String formatting (simple version) +function format() { + if (stack_ptr < 2) error("format requires at least two operands") + format_str = pop() + if (!isString(format_str)) error("format requires string format operand") + + # Simple implementation - replace ~a with arguments + fmt = getValue(format_str) + arg_count = 0 + + while (stack_ptr > 0) { + val = pop() + arg_count++ + # Replace ~a with the value + gsub(/~a/, val, fmt) + } + + push(makeString(fmt)) +} +``` + +### Phase 5: State Persistence for Strings + +#### 5.1 String State Management +**File: `bin/vm.awk`** + +**Update `save_state()` and state loading:** +- Strings stored in environment will be automatically persisted +- No special handling needed since strings use inline storage +- String literals in function definitions will be preserved + +**Considerations:** +- Long strings may impact state file size +- Escape sequences need proper handling in state files +- String encoding consistency across sessions + +### Phase 6: Testing and Validation + +#### 6.1 Test Cases +**Create test files:** + +```scheme +; Basic string literals +"hello world" +"" +"\"quoted\"" +"line1\nline2" + +; String operations +(string-length "hello") +(string-append "hello" " " "world") +(string-ref "hello" 0) +(substring "hello world" 0 4) + +; String comparisons +(string=? "hello" "hello") +(string=? "hello" "world") +(string<? "abc" "def") +(string>? "xyz" "abc") + +; Case-insensitive comparisons +(string-ci=? "Hello" "hello") +(string-ci<? "ABC" "def") + +; Conversions +(string->number "123") +(number->string 456) +(string->number "abc") + +; String in expressions +(define greeting "Hello") +(define name "World") +(string-append greeting " " name) + +; Strings in functions +(define (greet name) (string-append "Hello, " name "!")) +(greet "Alice") +``` + +#### 6.2 Edge Cases to Test +- Empty strings +- Strings with special characters +- Escape sequences +- Very long strings +- String operations on non-string types +- String comparisons with different encodings +- String persistence across REPL sessions + +### Phase 7: Performance and Optimization + +#### 7.1 Performance Considerations +- **String concatenation**: Current `string-append` is O(n²) - consider optimization +- **String storage**: Monitor memory usage with large strings +- **String comparisons**: Consider early termination for long strings +- **Escape sequence processing**: Optimize lexer for common cases + +#### 7.2 Potential Optimizations +```awk +# Optimized string concatenation (build in reverse) +function string_append_optimized() { + if (stack_ptr < 2) error("string-append requires at least two operands") + + # Count arguments and pre-allocate + arg_count = 0 + temp_stack[0] = stack_ptr + + # Pop all arguments to temp storage + while (stack_ptr > 0) { + val = pop() + if (!isString(val)) error("string-append requires string operands") + temp_stack[++arg_count] = getValue(val) + } + + # Concatenate in reverse order (more efficient) + result = "" + for (i = arg_count; i >= 1; i--) { + result = result temp_stack[i] + } + + push(makeString(result)) +} +``` + +### Phase 8: Documentation and Examples + +#### 8.1 Update README.md +- Add string data type to type system documentation +- Include string usage examples +- Document all string built-in functions +- Add string limitations and considerations + +#### 8.2 Update diagram.md +- Add string type to data type diagram +- Include string operations in VM instruction set +- Show string flow in execution examples + +### Implementation Order and Risk Assessment + +#### High Priority (Core Functionality) +1. **Type system extensions** - Low risk, foundational +2. **Lexer string support** - Medium risk, affects parsing +3. **Basic string operations** - Low risk, self-contained +4. **String literals in expressions** - Medium risk, integration + +#### Medium Priority (Enhanced Features) +5. **String comparisons** - Low risk, straightforward +6. **String conversions** - Low risk, utility functions +7. **State persistence** - Low risk, automatic + +#### Lower Priority (Advanced Features) +8. **String formatting** - Medium risk, complex parsing +9. **String splitting** - Low risk, utility function +10. **Performance optimizations** - Low risk, optional + +#### Risk Mitigation +- **Regression testing**: Ensure existing functionality unchanged +- **Incremental implementation**: Add features one at a time +- **Comprehensive testing**: Test all edge cases +- **Backward compatibility**: Maintain existing API + +### Success Criteria +1. String literals parse and compile correctly +2. Basic string operations work as expected +3. String state persists across REPL sessions +4. No regression in existing functionality +5. Performance remains acceptable +6. Documentation is complete and accurate + +### Future Enhancements +1. **Heap-based string storage** for very long strings +2. **String interning** for memory efficiency +3. **Unicode support** (complex, requires significant changes) +4. **Regular expression support** for string matching +5. **String formatting with more features** (like Common Lisp format) +6. **String streams** for efficient string building + +This plan provides a comprehensive roadmap for implementing string support while maintaining the existing architecture and ensuring backward compatibility. \ No newline at end of file diff --git a/awk/scheme/scheme/scratch/random.txt b/awk/scheme/scheme/scratch/random.txt new file mode 100644 index 0000000..20b3bcf --- /dev/null +++ b/awk/scheme/scheme/scratch/random.txt @@ -0,0 +1,112 @@ +Other Uses for the Awk-Scheme VM +================================ + +The stack-based VM in Awk-Scheme is surprisingly versatile. Here are some realistic alternatives that could be implemented using the same VM architecture: + +## 1. Forth Implementation +**Feasibility: High** + +Forth is a natural fit since it's already stack-based: +- **Stack Operations**: VM already has PUSH_CONST, POP, DUP, SWAP +- **Arithmetic**: ADD, SUB, MUL, DIV are perfect for Forth +- **Memory**: Could extend heap for Forth's memory model +- **Control Flow**: Would need to add conditional jumps and loops +- **Words**: Could map Forth words to VM functions + +**Required Extensions:** +- JMP, JZ (jump if zero), JNZ instructions +- More stack operations (OVER, ROT, etc.) +- Memory read/write operations +- Input/output operations + +## 2. Simple Calculator Language +**Feasibility: Very High** + +A basic calculator with variables and functions: +- **Syntax**: `x = 5; y = x + 3; print y` +- **Features**: Variables, basic math, simple functions +- **VM Fit**: Perfect - already has arithmetic, variables, functions + +**Minimal Changes Needed:** +- New parser for calculator syntax +- Assignment operator handling +- Print statement + +## 3. Configuration Language +**Feasibility: High** + +A simple config language for key-value pairs and nested structures: +- **Syntax**: `server { port = 8080; host = "localhost"; }` +- **Features**: Nested objects, arrays, basic expressions +- **VM Fit**: Good - can use cons cells for nested structures + +**Required Extensions:** +- String support +- Object/struct creation +- Field access operations + +## 4. Simple Scripting Language +**Feasibility: Medium** + +A basic scripting language with loops and conditionals: +- **Syntax**: `if x > 0 { y = x * 2; } else { y = 0; }` +- **Features**: Variables, conditionals, loops, functions +- **VM Fit**: Good but needs control flow + +**Required Extensions:** +- Conditional jumps +- Loop constructs +- Boolean logic operations + +## 5. Data Processing Language +**Feasibility: Medium** + +A language for simple data transformations: +- **Syntax**: `filter(x > 0) | map(x * 2) | sum()` +- **Features**: Pipeline operations, list processing +- **VM Fit**: Good - can use cons cells for lists + +**Required Extensions:** +- List operations (map, filter, reduce) +- Pipeline operator +- More built-in functions + +## 6. Simple Logic/Constraint Language +**Feasibility: High** + +A language for expressing simple constraints and rules: +- **Syntax**: `rule: if age > 18 then can_vote = true` +- **Features**: Rules, facts, simple inference +- **VM Fit**: Good - boolean operations and variables + +**Required Extensions:** +- Boolean logic (AND, OR, NOT) +- Rule evaluation +- Fact storage + +## VM Strengths for Alternative Languages: +1. **Stack-based**: Natural for many languages +2. **Typed values**: Good foundation for type safety +3. **Environment model**: Supports variables and scoping +4. **Function system**: Reusable for different syntaxes +5. **Extensible**: Easy to add new instructions + +## VM Limitations for Alternative Languages: +1. **Limited data types**: Only numbers, booleans, pairs, symbols +2. **No strings**: Would need to add string support +3. **Limited control flow**: Only function calls, no loops/conditionals +4. **No I/O**: Would need file/console operations +5. **Memory constraints**: Simple heap model + +## Most Realistic Next Steps: +1. **Forth**: Natural progression, leverages existing stack operations +2. **Calculator**: Minimal changes, good learning exercise +3. **Config Language**: Practical use case, moderate complexity + +## Implementation Strategy: +1. Keep the VM unchanged +2. Write new compilers for different syntaxes +3. Add minimal VM extensions as needed +4. Reuse existing VM infrastructure (stack, environment, functions) + +The VM's simplicity is actually a strength - it's easy to understand and extend, making it a good foundation for experimenting with different language designs. \ No newline at end of file |