# Rawk Compiler Rewrite Plan ## 1. Current State - The parser is fragile, with overlapping regexes and ad-hoc filters. - Function definitions are leaking into the output. - Debug output and legacy logic clutter the codebase. - Validation is inconsistent and sometimes too strict or too loose. - Recent attempts at a clean rewrite have revealed issues with global variable shadowing (e.g., `function_count`), which can cause state to be lost between parsing and code generation. ## 2. What We Know - **Goal:** Only valid AWK code and generated functions should appear in the output—never rawk function definitions. - **Best Practice:** Parsing should be stateful: when inside a function definition, skip all lines until the function body ends. - **Simplicity:** Enforce `{}` for all function bodies. Only parse/collect code outside of function definitions. - **AWK Global State:** All counters and arrays used for function tracking must be global and never shadowed by local variables or loop indices. ## 3. Goals - **Robust, simple parsing:** Only collect code outside of function definitions. - **Clear validation:** Fail fast and clearly if a function definition is malformed. - **No rawk function definitions in output:** Only AWK code and generated functions. - **Maintainable codebase:** No debug output, no ad-hoc filters, no legacy logic. Consider supporting this goal by introducing some dev tooling to help debug. ## 4. Plan ### A. Clean Up - Remove all debug output, catch-alls, and legacy single-line function support from `rawk.awk`. - Refactor the main block to use a clear state machine: - If inside a function definition, skip all lines until the function body ends. - Only collect lines outside of function definitions. - Audit all global variables (especially counters like `function_count`) to ensure they are never shadowed or re-initialized in any function or loop. ### B. Document - Keep this plan up to date as we proceed. - Document the new parsing and validation approach in the code and README. - Add a section for common pitfalls (see below). ### C. Implement 1. **Rewrite the main parsing logic:** - Use a stateful, brace-counting parser. - Only collect code outside of function definitions. 2. **Update validation:** - Only allow function definitions of the form `$name = (args) -> { ... }`. - Fail fast and clearly on any other form. 3. **Test and validate:** - Create minimal test files to validate the new parser. - Ensure no function definitions leak into the output. 4. **Update all tests and examples:** - Convert all function definitions to the new enforced style. - Remove any legacy syntax from tests and documentation. --- ## 5. Common Pitfalls - **Global Variable Shadowing:** Never use global counters (e.g., `function_count`) as local variables or loop indices. Always use unique local names for loops. - **AWK Arrays:** Arrays are global by default. Always clear or re-initialize as needed. - **Brace Counting:** Ensure the parser correctly tracks nested braces and only exits function mode when all braces are closed. - **Whitespace Handling:** Regexes for function headers must be robust to whitespace and formatting variations. --- ## 6. How to Resume - Start by reviewing this plan and the current state of `rawk_new.awk`. - Begin with a minimal test file (e.g., `test_clean.rawk`) and ensure the parser correctly collects and generates functions. - If functions are not being generated, check for global variable shadowing or state loss. - Once the parser is robust, proceed to update and validate all tests and documentation. --- ## 7. Next Steps 1. Clean up `rawk.awk` (remove debug, catch-alls, legacy logic). 2. Clean up repo, removing superfluous test and 1off files. 3. Audit and fix all global variable usage in the new parser. 4. Implement the new stateful parser. 5. Validate with minimal tests. 6. Update all tests and documentation.