about summary refs log tree commit diff stats
path: root/awk/rawk/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'awk/rawk/README.md')
-rw-r--r--awk/rawk/README.md579
1 files changed, 89 insertions, 490 deletions
diff --git a/awk/rawk/README.md b/awk/rawk/README.md
index 514ad50..d68217a 100644
--- a/awk/rawk/README.md
+++ b/awk/rawk/README.md
@@ -1,551 +1,150 @@
-# rawk - A Functional Programming Language for awk
+# rawk
+## Make awk rawk.
 
-**rawk** is a modern, functional-style language dialect that compiles to highly portable, standard `awk`. It provides a more expressive syntax for writing awk programs while maintaining full compatibility with existing awk code.
+Rawk helps to bring some modern developer comforts to awk while maintaining awk's portability and inbuilt goodness.
 
-## Features
+## Create a rawk file (`example.rawk`):
+```rawk
+BEGIN {
+    print "Hello from rawk!"
+}
 
-- **Functional Programming**: Define functions with a clean, modern syntax
-- **Portable**: Compiles to standard awk that runs on any implementation
-- **Mixed Code**: Seamlessly mix rawk functions with regular awk code
-- **Standard Library**: Built-in functional programming utilities
-- **Error Handling**: Comprehensive error messages and validation
+RAWK {
+    $greet = (name) -> {
+        return "Hello, " name "!";
+    };
+    
+    $add = (x, y) -> {
+        return x + y;
+    };
+}
 
-## Quick Start
+{
+    print greet("World");
+    print "2 + 3 =", add(2, 3);
+    exit 0;
+}
+```
 
-### Installation
+A `.awk` file should, generally, be a totally valid `.rawk` file. Just like any valid JavaScript is valid TypeScript, likewise with awk and rawk. 
 
-No installation required! Just download `rawk.awk` and you're ready to go.
+Rawk introduces a new semantic block to awk, so that you can write special forms within the `RAWK {...}` block. 
 
-### Basic Usage
+## Compile and run:
+```bash
+# Compile to awk
+awk -f rawk.awk example.rawk > example.awk
 
-1. **Create a rawk program** (`hello.rawk`):
-```rawk
-$greet = (name) -> "Hello, " name "!";
-$add = (x, y) -> x + y;
+# Run the compiled program
+echo "test" | awk -f example.awk
 
-BEGIN {
-    print greet("World")
-    print "2 + 3 =", add(2, 3)
-}
+# Or compile and run in one line
+echo "test" | awk -f rawk.awk example.rawk | awk -f -
 ```
 
-2. **Compile and run**:
+## How to run the example:
 ```bash
-awk -f rawk.awk hello.rawk | awk -f -
-```
+# Compile the example file
+awk -f rawk.awk example.rawk > example_output.awk
 
-3. **Or compile to a file**:
-```bash
-awk -f rawk.awk hello.rawk > hello.awk
-awk -f hello.awk
+# Run with sample log data
+awk -f example_output.awk sample.log
+
+# Or run with just a few lines
+head -10 sample.log | awk -f example_output.awk
+
+# Or compile and run without outputting an awk file to disk
+awk -f rawk.awk example.rawk | awk -f - sample.log
 ```
 
-## Language Syntax
+## Syntax
 
 ### Function Definitions
+All functions go inside an `RAWK { ... }` block. 
 
-**Single-line functions**:
 ```rawk
-$add = (x, y) -> x + y;
-$greet = (name) -> "Hello, " name;
-$square = (x) -> x * x;
-```
-
-**Multi-line functions**:
-```rawk
-$calculate_area = (width, height) -> {
-    area = width * height
-    return area
-};
-
-$factorial = (n) -> {
-    if (n <= 1) {
-        return 1
-    } else {
-        return n * factorial(n - 1)
-    }
-};
+RAWK {
+    $function_name = (param1, param2) -> {
+        return param1 + param2;
+    };
+}
 ```
 
 ### Function Calls
+Call rawk functions from anywhere in the code,
 
-Functions can be called directly, nested, and recursively:
 ```rawk
-$double = (x) -> x * 2;
-$square = (x) -> x * x;
-$factorial = (n) -> {
-    if (n <= 1) return 1
-    else return n * factorial(n - 1)
-};
-
-BEGIN {
-    result = double(square(5))  # Returns 50
-    print result
-    print factorial(5)          # Returns 120
+{
+    result = add(5, 3);
+    print result;
 }
 ```
 
-### Mixed awk/rawk Code
+### Mixed Code
+Mix and match awk and rawk code, 
 
-Regular awk code works seamlessly with rawk functions:
 ```rawk
-BEGIN { print "Starting processing..." }
+BEGIN { FS = "," }
 
-$process_line = (line) -> "Processed: " line;
+RAWK {
+    $process = (field) -> {
+        return "Processed: " field;
+    };
+}
 
 {
-    if (length($0) > 10) {
-        print process_line($0) " (long line)"
-    } else {
-        print process_line($0) " (short line)"
+    if ($1 != "") {
+        print process($1);
     }
 }
-
-END { print "Processing complete." }
 ```
 
 ## Standard Library
+Rawk boasts a rather large standard library.
 
-The following functions are automatically available:
-
-### Testing Functions
-- `assert(condition, message)`: Asserts a condition is true
-- `expect_equal(actual, expected, message)`: Asserts actual equals expected
-- `expect_true(condition, message)`: Asserts condition is true
-- `expect_false(condition, message)`: Asserts condition is false
-
-### Array Utilities
-- `keys(array)`: Returns count of keys in array
-- `values(array)`: Returns count of values in array
-- `get_keys(array, result)`: Populates result array with keys
-- `get_values(array, result)`: Populates result array with values
-
-### Functional Programming Functions
-- `map(func_name, array, result)`: Apply function to each element of array
-- `reduce(func_name, array, initial)`: Reduce array using function (left fold)
-- `pipe(value, func_name)`: Pipe value through a single function
-- `pipe_multi(value, func_names)`: Pipe value through multiple functions
-- `dispatch_call(func_name, arg1, arg2, ...)`: Dynamic function dispatch
-
-### Enhanced Array Utilities
-- `filter(predicate_func, array, result)`: Filter array elements based on predicate function
-- `find(predicate_func, array)`: Find first element that matches predicate
-- `findIndex(predicate_func, array)`: Find index of first element that matches predicate
-
-### Predicate Functions
-**Type Checking:**
-- `is_number(value)`: Check if value is a number
-- `is_string(value)`: Check if value is a string
-- `is_array(value)`: Check if value is an array (limited detection)
-- `is_empty(value)`: Check if value is empty
-
-**Numeric Predicates:**
-- `is_positive(value)`: Check if number is positive
-- `is_negative(value)`: Check if number is negative
-- `is_zero(value)`: Check if number is zero
-- `is_integer(value)`: Check if number is integer
-- `is_float(value)`: Check if number is float
-- `is_even(value)`: Check if number is even
-- `is_odd(value)`: Check if number is odd
-- `is_prime(value)`: Check if number is prime
-- `is_in_range(value, min, max)`: Check if number is in range
-
-**Boolean Predicates:**
-- `is_boolean(value)`: Check if value is boolean (0 or 1)
-- `is_truthy(value)`: Check if value is truthy
-- `is_falsy(value)`: Check if value is falsy
-
-**String Predicates:**
-- `is_alpha(value)`: Check if string is alphabetic
-- `is_numeric(value)`: Check if string is numeric
-- `is_alphanumeric(value)`: Check if string is alphanumeric
-- `is_whitespace(value)`: Check if string is whitespace
-- `is_uppercase(value)`: Check if string is uppercase
-- `is_lowercase(value)`: Check if string is lowercase
-- `is_palindrome(value)`: Enhanced palindrome detection with better whitespace and punctuation handling
-- `is_length(value, target_length)`: Check if string/array has specific length
-- `is_hex(value)`: Enhanced hex validation with optional 0x and # prefixes
-- `is_csv(value)`: Check if string appears to be CSV format (robust detection with quote handling)
-- `is_tsv(value)`: Check if string appears to be TSV format (robust detection with field splitting)
-
-**Validation Predicates:**
-- `is_email(value)`: Enhanced email validation with proper format checking
-- `is_url(value)`: Enhanced URL validation supporting multiple protocols (http, https, ftp, ftps, mailto, tel)
-- `is_ipv4(value)`: Basic IPv4 validation
-- `is_ipv6(value)`: Enhanced IPv6 validation with interface identifiers and proper :: handling
-- `is_uuid(value)`: UUID validation (comprehensive format support: hyphenated, no-hyphens, URN format)
-
-## Examples
-
-### System Monitoring
+### Testing
 ```rawk
-# Process df output to monitor disk usage
-$analyze_disk = (filesystem, size, used, avail, percent, mount) -> {
-    if (percent > 90) {
-        return "CRITICAL: " filesystem " (" mount ") is " percent "% full!"
-    } else if (percent > 80) {
-        return "WARNING: " filesystem " (" mount ") is " percent "% full"
-    } else {
-        return "OK: " filesystem " (" mount ") has " avail " blocks free"
-    }
-};
-
-/^\/dev\// {
-    result = analyze_disk($1, $2, $3, $4, $5, $6)
-    print "DISK: " result
-}
+expect_equal(add(2, 3), 5, "Addition should work");
+expect_true(is_positive(5), "5 should be positive");
 ```
 
-### Log Parsing
+### Type Checking Predicates
 ```rawk
-# Process Apache log entries
-$parse_apache_log = (ip, method, url, status, bytes) -> {
-    if (status >= 400) {
-        return "ERROR: " status " - " method " " url " from " ip
-    } else {
-        return "SUCCESS: " status " - " method " " url " (" bytes " bytes)"
-    }
-};
-
-/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/ {
-    result = parse_apache_log($1, $6, $7, $9, $10)
-    print "APACHE: " result
-}
+if (is_number(value)) { ... }
+if (is_string(value)) { ... }
 ```
 
-### CSV Processing
+### Varuius Validation Predicates
 ```rawk
-# Process employee data with validation
-$is_valid_email = (email) -> {
-    at_pos = index(email, "@")
-    if (at_pos == 0) return 0
-    dot_pos = index(substr(email, at_pos + 1), ".")
-    return dot_pos > 0
-};
-
-$format_employee = (name, email, age, salary, department) -> {
-    email_status = is_valid_email(email) ? "VALID" : "INVALID"
-    return name " (" department ") - " email_status " email, $" salary
-};
-
-BEGIN { FS = "," }
-NR > 1 {
-    result = format_employee($1, $2, $3, $4, $5)
-    print "EMPLOYEE: " result
-}
+if (is_email(email)) { ... }
+if (is_url(url)) { ... }
 ```
 
-### Data Processing
+### Functional Programming Patterns
 ```rawk
-$filter_positive = (arr, result, i, count) -> {
-    count = 0
-    for (i in arr) {
-        if (arr[i] > 0) {
-            result[++count] = arr[i]
-        }
-    }
-    return result
-};
+# Transform array elements
+count = map("double", numbers, doubled);
 
-$sum_array = (arr, sum, i) -> {
-    sum = 0
-    for (i in arr) {
-        sum += arr[i]
-    }
-    return sum
-};
+# Filter array elements  
+count = filter("is_positive", numbers, positive);
 
-BEGIN {
-    data[1] = 10
-    data[2] = -5
-    data[3] = 20
-    data[4] = -3
-    data[5] = 15
-    
-    positive = filter_positive(data)
-    total = sum_array(positive)
-    print "Sum of positive numbers:", total
-}
+# Reduce array to single value
+sum = reduce("add", numbers);
 ```
 
-### Data Format Detection
-```rawk
-$process_data_line = (line) -> {
-    if (is_hex(line)) {
-        return "Hexadecimal: " line
-    } else if (is_csv(line)) {
-        return "CSV data: " line
-    } else if (is_tsv(line)) {
-        return "TSV data: " line
-    } else {
-        return "Unknown format: " line
-    }
-};
-
-$validate_uuid = (uuid) -> {
-    if (is_uuid(uuid)) {
-        return "Valid UUID: " uuid
-    } else {
-        return "Invalid UUID: " uuid
-    }
-};
+## Testing
 
-BEGIN {
-    test_data[1] = "FF00AA"
-    test_data[2] = "name,age,city"
-    test_data[3] = "id\tname\tvalue"
-    test_data[4] = "plain_text"
-    test_data[5] = "123e4567-e89b-12d3-a456-426614174000"
-    test_data[6] = "123e4567e89b12d3a456426614174000"
-    test_data[7] = "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6"
-    
-    for (i in test_data) {
-        result = process_data_line(test_data[i])
-        print result
-    }
-    
-    print ""
-    print "UUID Validation Examples:"
-    print validate_uuid("123e4567-e89b-12d3-a456-426614174000")
-    print validate_uuid("123e4567e89b12d3a456426614174000")
-    print validate_uuid("urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6")
-    print validate_uuid("invalid-uuid")
-    
-    print ""
-    print "Enhanced Predicate Examples:"
-    print "Email validation:"
-    print "  user@domain.com -> " (is_email("user@domain.com") ? "VALID" : "INVALID")
-    print "  user name@domain.com -> " (is_email("user name@domain.com") ? "VALID" : "INVALID")
-    print "  user@@domain.com -> " (is_email("user@@domain.com") ? "VALID" : "INVALID")
-    
-    print "URL validation:"
-    print "  https://example.com -> " (is_url("https://example.com") ? "VALID" : "INVALID")
-    print "  ftp://example.com -> " (is_url("ftp://example.com") ? "VALID" : "INVALID")
-    print "  mailto:user@example.com -> " (is_url("mailto:user@example.com") ? "VALID" : "INVALID")
-    
-    print "Hex validation:"
-    print "  0xDEADBEEF -> " (is_hex("0xDEADBEEF") ? "VALID" : "INVALID")
-    print "  #ff0000 -> " (is_hex("#ff0000") ? "VALID" : "INVALID")
-    print "  deadbeef -> " (is_hex("deadbeef") ? "VALID" : "INVALID")
-}
-
-### Functional Programming
-```rawk
-$double = (x) -> x * 2;
-$add = (x, y) -> x + y;
-$square = (x) -> x * x;
-$add_one = (x) -> x + 1;
+Run the test suite, 
 
-BEGIN {
-    # Create test data
-    numbers[1] = 1
-    numbers[2] = 2
-    numbers[3] = 3
-    numbers[4] = 4
-    numbers[5] = 5
-    
-    # Map: Apply function to each element
-    doubled_count = map("double", numbers, doubled)
-    print "Doubled numbers:"
-    for (i = 1; i <= doubled_count; i++) {
-        print "  " doubled[i]
-    }
-    
-    # Reduce: Sum all numbers
-    sum = reduce("add", numbers)
-    print "Sum of numbers:", sum
-    
-    # Pipe: Single function pipeline
-    result = pipe(5, "square")
-    print "5 squared:", result
-    
-    # Pipe_multi: Multiple function pipeline
-    func_names[1] = "double"
-    func_names[2] = "add_one"
-    result = pipe_multi(5, func_names)
-    print "5 doubled then +1:", result
-    
-    # Complex composition: Sum of squares
-    squared_count = map("square", numbers, squared)
-    sum_of_squares = reduce("add", squared)
-    print "Sum of squares:", sum_of_squares
-}
-
-### Enhanced Array Utilities
-```rawk
-$is_positive = (x) -> x > 0;
-$is_even = (x) -> x % 2 == 0;
-$is_valid_email = (email) -> is_email(email);
-
-BEGIN {
-    # Test data
-    numbers[1] = -1
-    numbers[2] = 0
-    numbers[3] = 1
-    numbers[4] = -5
-    numbers[5] = 10
-    numbers[6] = -3
-    numbers[7] = 7
-    
-    emails[1] = "user@example.com"
-    emails[2] = "invalid-email"
-    emails[3] = "another@domain.org"
-    emails[4] = "not-an-email"
-    
-    # Filter positive numbers
-    positive_count = filter("is_positive", numbers, positive_numbers)
-    print "Positive numbers (count:", positive_count, "):"
-    for (i = 1; i <= positive_count; i++) {
-        print "  " positive_numbers[i]
-    }
-    
-    # Find first even number
-    first_even = find("is_even", numbers)
-    print "First even number:", first_even
-    
-    # Find index of first negative number
-    first_negative_index = findIndex("is_negative", numbers)
-    print "First negative at index:", first_negative_index
-    
-    # Filter valid emails
-    valid_emails_count = filter("is_valid_email", emails, valid_emails)
-    print "Valid emails (count:", valid_emails_count, "):"
-    for (i = 1; i <= valid_emails_count; i++) {
-        print "  " valid_emails[i]
-    }
-    
-    # Integration: Filter then map
-    filtered_count = filter("is_positive", numbers, filtered)
-    doubled_count = map("double", filtered, doubled_filtered)
-    print "Doubled positive numbers (count:", doubled_count, "):"
-    for (i = 1; i <= doubled_count; i++) {
-        print "  " doubled_filtered[i]
-    }
-}
-```
-```
-```
-
-## Test Files
-
-The project includes a comprehensive test suite organized in the `tests/` directory:
-
-### Directory Structure
-```
-tests/
-├── core/           # Core language features
-├── real_world/     # Practical examples
-├── stdlib/         # Standard library tests
-├── data/           # Test data files
-└── README.md       # Test documentation
-```
-
-### Core Language Tests (`tests/core/`)
-- `test_suite.rawk`: Comprehensive test suite with 15+ test cases
-- `test_basic.rawk`: Basic function definitions and calls
-- `test_multiline.rawk`: Multi-line function definitions
-- `test_edge_cases.rawk`: Edge cases and error conditions
-- `test_recursive.rawk`: Recursive function support
-- `test_array_fix.rawk`: Array handling and utilities
-- `test_failure.rawk`: Demonstrates failing assertions
-
-### Real-World Examples (`tests/real_world/`)
-- `test_system_monitor.rawk`: System monitoring (df, ps, ls output)
-- `test_log_parser.rawk`: Log parsing (Apache, syslog format)
-- `test_csv_processor.rawk`: CSV data processing with validation
-- `test_data_processing.rawk`: General data processing scenarios
-- `test_mixed.rawk`: Mixed awk and rawk code
-
-### Standard Library Tests (`tests/stdlib/`)
-- `test_stdlib_simple.rawk`: Tests for built-in functions
-- `test_functional.rawk`: Tests for functional programming features
-- `test_enhanced_utilities_simple.rawk`: Tests for enhanced array utilities
-
-### Test Data (`tests/data/`)
-- `test_data.txt`: Simulated system command outputs
-- `test_logs.txt`: Sample Apache and syslog entries
-- `test_employees.csv`: Sample employee data
-- `test_input.txt`: Simple input data for mixed tests
-
-Run tests with:
 ```bash
-# Run the comprehensive test suite
-awk -f rawk.awk tests/core/test_suite.rawk | awk -f -
-
-# Run real-world examples
-awk -f rawk.awk tests/real_world/test_system_monitor.rawk | awk -f - tests/data/test_data.txt
-awk -f rawk.awk tests/real_world/test_log_parser.rawk | awk -f - tests/data/test_logs.txt
-awk -f rawk.awk tests/real_world/test_csv_processor.rawk | awk -f - tests/data/test_employees.csv
-
-# Run individual core tests
-awk -f rawk.awk tests/core/test_basic.rawk | awk -f -
+cd tests && ./test_runner.sh
 ```
 
-### Writing Tests
-
-rawk includes a built-in testing framework with assertion functions:
+## Requirements
 
-```rawk
-$add = (x, y) -> x + y;
-
-BEGIN {
-    # Test basic functionality
-    result = add(2, 3)
-    expect_equal(result, 5, "add(2, 3) should return 5")
-    
-    # Test edge cases
-    result = add(0, 0)
-    expect_equal(result, 0, "add(0, 0) should return 0")
-    
-    # Test boolean conditions
-    expect_true(add(2, 2) == 4, "2 + 2 should equal 4")
-    expect_false(add(2, 2) == 5, "2 + 2 should not equal 5")
-    
-    print "All tests passed!"
-}
-```
-
-## Compilation Process
-
-1. **Parse**: rawk function definitions are parsed using `split` on the `->` symbol
-2. **Generate**: Internal awk functions are generated with unique names (`__lambda_0`, `__lambda_1`, etc.)
-3. **Dispatch**: A dispatch table maps public function names to internal names
-4. **Replace**: Function calls are replaced with internal names during compilation
-5. **Output**: Standard library functions are prepended to the final awk script
-
-## Limitations
-
-- **Function Names**: Must be valid awk identifiers
-- **Array Returns**: Functions cannot return arrays (use pass-by-reference instead)
-- **Array Order**: AWK doesn't guarantee array iteration order
-- **Dynamic Dispatch**: Limited to functions defined at compile time
-
-## Error Handling
-
-The compiler provides helpful error messages for:
-- Invalid function definition syntax
-- Missing `->` symbols
-- Malformed argument lists
-- Unexpected function definitions in multi-line bodies
-
-## Portability
-
-- **Target**: Standard awk (nawk, BSD awk)
-- **Avoids**: gawk-specific features
-- **Uses**: Only standard awk constructs and functions
-- **Compatibility**: Works on any POSIX-compliant system
-
-## Contributing
-
-1. Add test cases for new features
-2. Ensure compatibility with standard awk
-3. Update documentation for new functionality
-4. Test on multiple awk implementations
+- Any awk implementation (gawk, mawk, nawk, etc.)
+- No additional dependencies, strives to work with any POSIX awk
 
 ## License
 
-This project is open source. Feel free to use, modify, and distribute as needed.
-
-## Acknowledgments
-
-Inspired by the need for a more expressive syntax for awk programming while maintaining the portability and simplicity that makes awk so powerful. 
\ No newline at end of file
+Public Domain
\ No newline at end of file