about summary refs log tree commit diff stats
path: root/awk/rawk/scratch/REWRITE_PLAN.md
diff options
context:
space:
mode:
Diffstat (limited to 'awk/rawk/scratch/REWRITE_PLAN.md')
-rw-r--r--awk/rawk/scratch/REWRITE_PLAN.md74
1 files changed, 74 insertions, 0 deletions
diff --git a/awk/rawk/scratch/REWRITE_PLAN.md b/awk/rawk/scratch/REWRITE_PLAN.md
new file mode 100644
index 0000000..6ef6d38
--- /dev/null
+++ b/awk/rawk/scratch/REWRITE_PLAN.md
@@ -0,0 +1,74 @@
+# Rawk Compiler Rewrite Plan
+
+## 1. Current State
+- The parser is fragile, with overlapping regexes and ad-hoc filters.
+- Function definitions are leaking into the output.
+- Debug output and legacy logic clutter the codebase.
+- Validation is inconsistent and sometimes too strict or too loose.
+- Recent attempts at a clean rewrite have revealed issues with global variable shadowing (e.g., `function_count`), which can cause state to be lost between parsing and code generation.
+
+## 2. What We Know
+- **Goal:** Only valid AWK code and generated functions should appear in the output—never rawk function definitions.
+- **Best Practice:** Parsing should be stateful: when inside a function definition, skip all lines until the function body ends.
+- **Simplicity:** Enforce `{}` for all function bodies. Only parse/collect code outside of function definitions.
+- **AWK Global State:** All counters and arrays used for function tracking must be global and never shadowed by local variables or loop indices.
+
+## 3. Goals
+- **Robust, simple parsing:** Only collect code outside of function definitions.
+- **Clear validation:** Fail fast and clearly if a function definition is malformed.
+- **No rawk function definitions in output:** Only AWK code and generated functions.
+- **Maintainable codebase:** No debug output, no ad-hoc filters, no legacy logic. Consider supporting this goal by introducing some dev tooling to help debug.
+
+## 4. Plan
+
+### A. Clean Up
+- Remove all debug output, catch-alls, and legacy single-line function support from `rawk.awk`.
+- Refactor the main block to use a clear state machine:
+  - If inside a function definition, skip all lines until the function body ends.
+  - Only collect lines outside of function definitions.
+- Audit all global variables (especially counters like `function_count`) to ensure they are never shadowed or re-initialized in any function or loop.
+
+### B. Document
+- Keep this plan up to date as we proceed.
+- Document the new parsing and validation approach in the code and README.
+- Add a section for common pitfalls (see below).
+
+### C. Implement
+1. **Rewrite the main parsing logic:**
+   - Use a stateful, brace-counting parser.
+   - Only collect code outside of function definitions.
+2. **Update validation:**
+   - Only allow function definitions of the form `$name = (args) -> { ... }`.
+   - Fail fast and clearly on any other form.
+3. **Test and validate:**
+   - Create minimal test files to validate the new parser.
+   - Ensure no function definitions leak into the output.
+4. **Update all tests and examples:**
+   - Convert all function definitions to the new enforced style.
+   - Remove any legacy syntax from tests and documentation.
+
+---
+
+## 5. Common Pitfalls
+- **Global Variable Shadowing:** Never use global counters (e.g., `function_count`) as local variables or loop indices. Always use unique local names for loops.
+- **AWK Arrays:** Arrays are global by default. Always clear or re-initialize as needed.
+- **Brace Counting:** Ensure the parser correctly tracks nested braces and only exits function mode when all braces are closed.
+- **Whitespace Handling:** Regexes for function headers must be robust to whitespace and formatting variations.
+
+---
+
+## 6. How to Resume
+- Start by reviewing this plan and the current state of `rawk_new.awk`.
+- Begin with a minimal test file (e.g., `test_clean.rawk`) and ensure the parser correctly collects and generates functions.
+- If functions are not being generated, check for global variable shadowing or state loss.
+- Once the parser is robust, proceed to update and validate all tests and documentation.
+
+---
+
+## 7. Next Steps
+1. Clean up `rawk.awk` (remove debug, catch-alls, legacy logic).
+2. Clean up repo, removing superfluous test and 1off files.
+3. Audit and fix all global variable usage in the new parser.
+4. Implement the new stateful parser.
+5. Validate with minimal tests.
+6. Update all tests and documentation.
\ No newline at end of file