# rawk v2.0.0 - Current State Documentation ## 🎯 Project Overview **rawk** is a functional programming language that compiles to standard AWK. It provides a cleaner, more structured syntax for AWK development while maintaining full compatibility with existing AWK code. ## 🏗️ Architecture ### Multi-Pass Compiler The current implementation uses a robust multi-pass approach: 1. **Pass 1**: Collect all source lines into memory 2. **Pass 2**: Detect and validate RAWK blocks 3. **Pass 3**: Extract function definitions from RAWK blocks 4. **Pass 4**: Generate output (standard library + user functions + main script) ### Key Benefits - **No variable scoping issues**: Eliminates AWK's variable scoping problems - **Predictable parsing**: Each pass has a single responsibility - **Easy to extend**: New features can be added as new passes - **Robust error handling**: Clear, actionable error messages ## 📝 Language Specification ### Block-Based Structure ```rawk BEGIN { print "Initialization" } RAWK { $add = (x, y) -> { return x + y; }; $multiply = (a, b) -> { return a * b; }; } { result = add(5, 3); print result; } ``` ### Function Definitions - **Location**: Only inside `RAWK { ... }` blocks - **Syntax**: `$name = (args) -> { ... }` (braces required) - **Arguments**: Comma-separated list in parentheses - **Body**: Multi-line block with explicit `return` statements ### Function Calls - **Location**: Anywhere in regular AWK code - **Syntax**: `function_name(arg1, arg2, ...)` - **Scope**: Functions are globally available after definition ### Standard Library Currently includes basic testing functions: - `assert(condition, message)` - `expect_equal(actual, expected, message)` - `expect_true(condition, message)` - `expect_false(condition, message)` ## 🔧 Implementation Details ### File Structure ``` rawk/ ├── rawk_block_based.awk # Main compiler (multi-pass) ├── rawk.awk # Original implementation (reference) ├── scratch/ # Archived experimental versions ├── tests/ # Test suite ├── simple_test.rawk # Basic test case └── example.rawk # Example usage ``` ### Compilation Process ```bash # Two-stage compilation (recommended) awk -f rawk_block_based.awk input.rawk > output.awk awk -f output.awk input_data.txt # One-stage compilation and execution awk -f rawk_block_based.awk input.rawk | awk -f - input_data.txt ``` ### Error Handling - **Missing RAWK block**: "Error: No RAWK block found" - **Nested RAWK blocks**: "Error: Nested or multiple RAWK blocks are not supported" - **Unclosed RAWK block**: "Error: RAWK block opened at line X but never closed" - **Invalid function syntax**: Detailed error messages with suggestions ## ✅ What's Working ### Core Features - ✅ Block-based function definitions - ✅ Multi-line function bodies - ✅ Function extraction and generation - ✅ RAWK block validation - ✅ Basic error handling - ✅ Standard library generation - ✅ Clean output generation ### Test Cases - ✅ Simple function definition and call - ✅ BEGIN block integration - ✅ Main block execution - ✅ Function return values ## 🚧 What's Missing ### Smart Standard Library - **Current**: Always includes all standard library functions - **Goal**: Only include functions actually referenced in the code - **Implementation**: Need to track function calls and analyze dependencies ### Enhanced Error Handling - **Current**: Basic error messages - **Goal**: Comprehensive validation with line numbers and suggestions - **Missing**: Function call validation, argument count checking ### Function Call Rewriting - **Current**: Function calls are passed through unchanged - **Goal**: Rewrite function calls to use internal names (like original rawk.awk) - **Benefit**: Better error handling and potential optimization ### Extended Standard Library - **Current**: Basic testing functions only - **Goal**: Full standard library from original rawk.awk - **Includes**: Array utilities, functional programming, predicates, etc. ### Documentation and Examples - **Current**: Basic examples - **Goal**: Comprehensive documentation and test suite - **Missing**: Migration guide, best practices, real-world examples ## 🎯 Next Steps Plan ### Phase 1: Core Improvements (Immediate) 1. **Function call analysis**: Track which functions are actually used 2. **Smart standard library**: Only include referenced functions 3. **Function call rewriting**: Use internal names for better error handling 4. **Enhanced validation**: Check function calls exist, argument counts match ### Phase 2: Standard Library (Short-term) 1. **Port full standard library**: Array utilities, functional programming, predicates 2. **Smart inclusion**: Only include functions that are actually used 3. **Documentation**: Document all available standard library functions ### Phase 3: Developer Experience (Medium-term) 1. **Better error messages**: Line numbers, context, suggestions 2. **Warning system**: Non-fatal issues that should be addressed 3. **Debug mode**: Verbose output for troubleshooting 4. **Test suite**: Comprehensive tests for all features ### Phase 4: Advanced Features (Long-term) 1. **Import system**: Include other rawk files 2. **Type checking**: Basic type validation 3. **Optimization**: Code optimization passes 4. **IDE support**: Language server, syntax highlighting ## 🔍 Technical Decisions ### Why Multi-Pass? - **Problem**: AWK variable scoping issues made single-pass parsing unreliable - **Solution**: Multi-pass eliminates state management complexity - **Benefit**: More robust, easier to debug and extend ### Why Block-Based? - **Problem**: Original syntax was ambiguous and hard to parse - **Solution**: Explicit blocks make parsing deterministic - **Benefit**: Clearer code structure, better error messages ### Why Braces Required? - **Problem**: Optional braces made parsing complex - **Solution**: Always require braces for function definitions - **Benefit**: Simpler parsing, clearer code, fewer edge cases ## 📊 Success Metrics ### Current Status - ✅ **Compilation**: Works correctly for basic cases - ✅ **Function extraction**: Properly extracts and generates functions - ✅ **Error handling**: Basic validation working - ✅ **Output quality**: Clean, readable AWK code ### Target Metrics - **Test coverage**: 90%+ of language features tested - **Error messages**: 100% actionable with line numbers - **Performance**: Compilation time < 100ms for typical files - **Compatibility**: 100% compatible with existing AWK code ## 🎉 Conclusion The multi-pass block-based approach has successfully solved the core technical challenges. The implementation is now robust, maintainable, and ready for enhancement. The foundation is solid for building out the full feature set. **Next immediate step**: Implement function call analysis and smart standard library inclusion.