diff options
Diffstat (limited to 'awk/rawk/README.md')
-rw-r--r-- | awk/rawk/README.md | 579 |
1 files changed, 89 insertions, 490 deletions
diff --git a/awk/rawk/README.md b/awk/rawk/README.md index 514ad50..d68217a 100644 --- a/awk/rawk/README.md +++ b/awk/rawk/README.md @@ -1,551 +1,150 @@ -# rawk - A Functional Programming Language for awk +# rawk +## Make awk rawk. -**rawk** is a modern, functional-style language dialect that compiles to highly portable, standard `awk`. It provides a more expressive syntax for writing awk programs while maintaining full compatibility with existing awk code. +Rawk helps to bring some modern developer comforts to awk while maintaining awk's portability and inbuilt goodness. -## Features +## Create a rawk file (`example.rawk`): +```rawk +BEGIN { + print "Hello from rawk!" +} -- **Functional Programming**: Define functions with a clean, modern syntax -- **Portable**: Compiles to standard awk that runs on any implementation -- **Mixed Code**: Seamlessly mix rawk functions with regular awk code -- **Standard Library**: Built-in functional programming utilities -- **Error Handling**: Comprehensive error messages and validation +RAWK { + $greet = (name) -> { + return "Hello, " name "!"; + }; + + $add = (x, y) -> { + return x + y; + }; +} -## Quick Start +{ + print greet("World"); + print "2 + 3 =", add(2, 3); + exit 0; +} +``` -### Installation +A `.awk` file should, generally, be a totally valid `.rawk` file. Just like any valid JavaScript is valid TypeScript, likewise with awk and rawk. -No installation required! Just download `rawk.awk` and you're ready to go. +Rawk introduces a new semantic block to awk, so that you can write special forms within the `RAWK {...}` block. -### Basic Usage +## Compile and run: +```bash +# Compile to awk +awk -f rawk.awk example.rawk > example.awk -1. **Create a rawk program** (`hello.rawk`): -```rawk -$greet = (name) -> "Hello, " name "!"; -$add = (x, y) -> x + y; +# Run the compiled program +echo "test" | awk -f example.awk -BEGIN { - print greet("World") - print "2 + 3 =", add(2, 3) -} +# Or compile and run in one line +echo "test" | awk -f rawk.awk example.rawk | awk -f - ``` -2. **Compile and run**: +## How to run the example: ```bash -awk -f rawk.awk hello.rawk | awk -f - -``` +# Compile the example file +awk -f rawk.awk example.rawk > example_output.awk -3. **Or compile to a file**: -```bash -awk -f rawk.awk hello.rawk > hello.awk -awk -f hello.awk +# Run with sample log data +awk -f example_output.awk sample.log + +# Or run with just a few lines +head -10 sample.log | awk -f example_output.awk + +# Or compile and run without outputting an awk file to disk +awk -f rawk.awk example.rawk | awk -f - sample.log ``` -## Language Syntax +## Syntax ### Function Definitions +All functions go inside an `RAWK { ... }` block. -**Single-line functions**: ```rawk -$add = (x, y) -> x + y; -$greet = (name) -> "Hello, " name; -$square = (x) -> x * x; -``` - -**Multi-line functions**: -```rawk -$calculate_area = (width, height) -> { - area = width * height - return area -}; - -$factorial = (n) -> { - if (n <= 1) { - return 1 - } else { - return n * factorial(n - 1) - } -}; +RAWK { + $function_name = (param1, param2) -> { + return param1 + param2; + }; +} ``` ### Function Calls +Call rawk functions from anywhere in the code, -Functions can be called directly, nested, and recursively: ```rawk -$double = (x) -> x * 2; -$square = (x) -> x * x; -$factorial = (n) -> { - if (n <= 1) return 1 - else return n * factorial(n - 1) -}; - -BEGIN { - result = double(square(5)) # Returns 50 - print result - print factorial(5) # Returns 120 +{ + result = add(5, 3); + print result; } ``` -### Mixed awk/rawk Code +### Mixed Code +Mix and match awk and rawk code, -Regular awk code works seamlessly with rawk functions: ```rawk -BEGIN { print "Starting processing..." } +BEGIN { FS = "," } -$process_line = (line) -> "Processed: " line; +RAWK { + $process = (field) -> { + return "Processed: " field; + }; +} { - if (length($0) > 10) { - print process_line($0) " (long line)" - } else { - print process_line($0) " (short line)" + if ($1 != "") { + print process($1); } } - -END { print "Processing complete." } ``` ## Standard Library +Rawk boasts a rather large standard library. -The following functions are automatically available: - -### Testing Functions -- `assert(condition, message)`: Asserts a condition is true -- `expect_equal(actual, expected, message)`: Asserts actual equals expected -- `expect_true(condition, message)`: Asserts condition is true -- `expect_false(condition, message)`: Asserts condition is false - -### Array Utilities -- `keys(array)`: Returns count of keys in array -- `values(array)`: Returns count of values in array -- `get_keys(array, result)`: Populates result array with keys -- `get_values(array, result)`: Populates result array with values - -### Functional Programming Functions -- `map(func_name, array, result)`: Apply function to each element of array -- `reduce(func_name, array, initial)`: Reduce array using function (left fold) -- `pipe(value, func_name)`: Pipe value through a single function -- `pipe_multi(value, func_names)`: Pipe value through multiple functions -- `dispatch_call(func_name, arg1, arg2, ...)`: Dynamic function dispatch - -### Enhanced Array Utilities -- `filter(predicate_func, array, result)`: Filter array elements based on predicate function -- `find(predicate_func, array)`: Find first element that matches predicate -- `findIndex(predicate_func, array)`: Find index of first element that matches predicate - -### Predicate Functions -**Type Checking:** -- `is_number(value)`: Check if value is a number -- `is_string(value)`: Check if value is a string -- `is_array(value)`: Check if value is an array (limited detection) -- `is_empty(value)`: Check if value is empty - -**Numeric Predicates:** -- `is_positive(value)`: Check if number is positive -- `is_negative(value)`: Check if number is negative -- `is_zero(value)`: Check if number is zero -- `is_integer(value)`: Check if number is integer -- `is_float(value)`: Check if number is float -- `is_even(value)`: Check if number is even -- `is_odd(value)`: Check if number is odd -- `is_prime(value)`: Check if number is prime -- `is_in_range(value, min, max)`: Check if number is in range - -**Boolean Predicates:** -- `is_boolean(value)`: Check if value is boolean (0 or 1) -- `is_truthy(value)`: Check if value is truthy -- `is_falsy(value)`: Check if value is falsy - -**String Predicates:** -- `is_alpha(value)`: Check if string is alphabetic -- `is_numeric(value)`: Check if string is numeric -- `is_alphanumeric(value)`: Check if string is alphanumeric -- `is_whitespace(value)`: Check if string is whitespace -- `is_uppercase(value)`: Check if string is uppercase -- `is_lowercase(value)`: Check if string is lowercase -- `is_palindrome(value)`: Enhanced palindrome detection with better whitespace and punctuation handling -- `is_length(value, target_length)`: Check if string/array has specific length -- `is_hex(value)`: Enhanced hex validation with optional 0x and # prefixes -- `is_csv(value)`: Check if string appears to be CSV format (robust detection with quote handling) -- `is_tsv(value)`: Check if string appears to be TSV format (robust detection with field splitting) - -**Validation Predicates:** -- `is_email(value)`: Enhanced email validation with proper format checking -- `is_url(value)`: Enhanced URL validation supporting multiple protocols (http, https, ftp, ftps, mailto, tel) -- `is_ipv4(value)`: Basic IPv4 validation -- `is_ipv6(value)`: Enhanced IPv6 validation with interface identifiers and proper :: handling -- `is_uuid(value)`: UUID validation (comprehensive format support: hyphenated, no-hyphens, URN format) - -## Examples - -### System Monitoring +### Testing ```rawk -# Process df output to monitor disk usage -$analyze_disk = (filesystem, size, used, avail, percent, mount) -> { - if (percent > 90) { - return "CRITICAL: " filesystem " (" mount ") is " percent "% full!" - } else if (percent > 80) { - return "WARNING: " filesystem " (" mount ") is " percent "% full" - } else { - return "OK: " filesystem " (" mount ") has " avail " blocks free" - } -}; - -/^\/dev\// { - result = analyze_disk($1, $2, $3, $4, $5, $6) - print "DISK: " result -} +expect_equal(add(2, 3), 5, "Addition should work"); +expect_true(is_positive(5), "5 should be positive"); ``` -### Log Parsing +### Type Checking Predicates ```rawk -# Process Apache log entries -$parse_apache_log = (ip, method, url, status, bytes) -> { - if (status >= 400) { - return "ERROR: " status " - " method " " url " from " ip - } else { - return "SUCCESS: " status " - " method " " url " (" bytes " bytes)" - } -}; - -/^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/ { - result = parse_apache_log($1, $6, $7, $9, $10) - print "APACHE: " result -} +if (is_number(value)) { ... } +if (is_string(value)) { ... } ``` -### CSV Processing +### Varuius Validation Predicates ```rawk -# Process employee data with validation -$is_valid_email = (email) -> { - at_pos = index(email, "@") - if (at_pos == 0) return 0 - dot_pos = index(substr(email, at_pos + 1), ".") - return dot_pos > 0 -}; - -$format_employee = (name, email, age, salary, department) -> { - email_status = is_valid_email(email) ? "VALID" : "INVALID" - return name " (" department ") - " email_status " email, $" salary -}; - -BEGIN { FS = "," } -NR > 1 { - result = format_employee($1, $2, $3, $4, $5) - print "EMPLOYEE: " result -} +if (is_email(email)) { ... } +if (is_url(url)) { ... } ``` -### Data Processing +### Functional Programming Patterns ```rawk -$filter_positive = (arr, result, i, count) -> { - count = 0 - for (i in arr) { - if (arr[i] > 0) { - result[++count] = arr[i] - } - } - return result -}; +# Transform array elements +count = map("double", numbers, doubled); -$sum_array = (arr, sum, i) -> { - sum = 0 - for (i in arr) { - sum += arr[i] - } - return sum -}; +# Filter array elements +count = filter("is_positive", numbers, positive); -BEGIN { - data[1] = 10 - data[2] = -5 - data[3] = 20 - data[4] = -3 - data[5] = 15 - - positive = filter_positive(data) - total = sum_array(positive) - print "Sum of positive numbers:", total -} +# Reduce array to single value +sum = reduce("add", numbers); ``` -### Data Format Detection -```rawk -$process_data_line = (line) -> { - if (is_hex(line)) { - return "Hexadecimal: " line - } else if (is_csv(line)) { - return "CSV data: " line - } else if (is_tsv(line)) { - return "TSV data: " line - } else { - return "Unknown format: " line - } -}; - -$validate_uuid = (uuid) -> { - if (is_uuid(uuid)) { - return "Valid UUID: " uuid - } else { - return "Invalid UUID: " uuid - } -}; +## Testing -BEGIN { - test_data[1] = "FF00AA" - test_data[2] = "name,age,city" - test_data[3] = "id\tname\tvalue" - test_data[4] = "plain_text" - test_data[5] = "123e4567-e89b-12d3-a456-426614174000" - test_data[6] = "123e4567e89b12d3a456426614174000" - test_data[7] = "urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6" - - for (i in test_data) { - result = process_data_line(test_data[i]) - print result - } - - print "" - print "UUID Validation Examples:" - print validate_uuid("123e4567-e89b-12d3-a456-426614174000") - print validate_uuid("123e4567e89b12d3a456426614174000") - print validate_uuid("urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6") - print validate_uuid("invalid-uuid") - - print "" - print "Enhanced Predicate Examples:" - print "Email validation:" - print " user@domain.com -> " (is_email("user@domain.com") ? "VALID" : "INVALID") - print " user name@domain.com -> " (is_email("user name@domain.com") ? "VALID" : "INVALID") - print " user@@domain.com -> " (is_email("user@@domain.com") ? "VALID" : "INVALID") - - print "URL validation:" - print " https://example.com -> " (is_url("https://example.com") ? "VALID" : "INVALID") - print " ftp://example.com -> " (is_url("ftp://example.com") ? "VALID" : "INVALID") - print " mailto:user@example.com -> " (is_url("mailto:user@example.com") ? "VALID" : "INVALID") - - print "Hex validation:" - print " 0xDEADBEEF -> " (is_hex("0xDEADBEEF") ? "VALID" : "INVALID") - print " #ff0000 -> " (is_hex("#ff0000") ? "VALID" : "INVALID") - print " deadbeef -> " (is_hex("deadbeef") ? "VALID" : "INVALID") -} - -### Functional Programming -```rawk -$double = (x) -> x * 2; -$add = (x, y) -> x + y; -$square = (x) -> x * x; -$add_one = (x) -> x + 1; +Run the test suite, -BEGIN { - # Create test data - numbers[1] = 1 - numbers[2] = 2 - numbers[3] = 3 - numbers[4] = 4 - numbers[5] = 5 - - # Map: Apply function to each element - doubled_count = map("double", numbers, doubled) - print "Doubled numbers:" - for (i = 1; i <= doubled_count; i++) { - print " " doubled[i] - } - - # Reduce: Sum all numbers - sum = reduce("add", numbers) - print "Sum of numbers:", sum - - # Pipe: Single function pipeline - result = pipe(5, "square") - print "5 squared:", result - - # Pipe_multi: Multiple function pipeline - func_names[1] = "double" - func_names[2] = "add_one" - result = pipe_multi(5, func_names) - print "5 doubled then +1:", result - - # Complex composition: Sum of squares - squared_count = map("square", numbers, squared) - sum_of_squares = reduce("add", squared) - print "Sum of squares:", sum_of_squares -} - -### Enhanced Array Utilities -```rawk -$is_positive = (x) -> x > 0; -$is_even = (x) -> x % 2 == 0; -$is_valid_email = (email) -> is_email(email); - -BEGIN { - # Test data - numbers[1] = -1 - numbers[2] = 0 - numbers[3] = 1 - numbers[4] = -5 - numbers[5] = 10 - numbers[6] = -3 - numbers[7] = 7 - - emails[1] = "user@example.com" - emails[2] = "invalid-email" - emails[3] = "another@domain.org" - emails[4] = "not-an-email" - - # Filter positive numbers - positive_count = filter("is_positive", numbers, positive_numbers) - print "Positive numbers (count:", positive_count, "):" - for (i = 1; i <= positive_count; i++) { - print " " positive_numbers[i] - } - - # Find first even number - first_even = find("is_even", numbers) - print "First even number:", first_even - - # Find index of first negative number - first_negative_index = findIndex("is_negative", numbers) - print "First negative at index:", first_negative_index - - # Filter valid emails - valid_emails_count = filter("is_valid_email", emails, valid_emails) - print "Valid emails (count:", valid_emails_count, "):" - for (i = 1; i <= valid_emails_count; i++) { - print " " valid_emails[i] - } - - # Integration: Filter then map - filtered_count = filter("is_positive", numbers, filtered) - doubled_count = map("double", filtered, doubled_filtered) - print "Doubled positive numbers (count:", doubled_count, "):" - for (i = 1; i <= doubled_count; i++) { - print " " doubled_filtered[i] - } -} -``` -``` -``` - -## Test Files - -The project includes a comprehensive test suite organized in the `tests/` directory: - -### Directory Structure -``` -tests/ -├── core/ # Core language features -├── real_world/ # Practical examples -├── stdlib/ # Standard library tests -├── data/ # Test data files -└── README.md # Test documentation -``` - -### Core Language Tests (`tests/core/`) -- `test_suite.rawk`: Comprehensive test suite with 15+ test cases -- `test_basic.rawk`: Basic function definitions and calls -- `test_multiline.rawk`: Multi-line function definitions -- `test_edge_cases.rawk`: Edge cases and error conditions -- `test_recursive.rawk`: Recursive function support -- `test_array_fix.rawk`: Array handling and utilities -- `test_failure.rawk`: Demonstrates failing assertions - -### Real-World Examples (`tests/real_world/`) -- `test_system_monitor.rawk`: System monitoring (df, ps, ls output) -- `test_log_parser.rawk`: Log parsing (Apache, syslog format) -- `test_csv_processor.rawk`: CSV data processing with validation -- `test_data_processing.rawk`: General data processing scenarios -- `test_mixed.rawk`: Mixed awk and rawk code - -### Standard Library Tests (`tests/stdlib/`) -- `test_stdlib_simple.rawk`: Tests for built-in functions -- `test_functional.rawk`: Tests for functional programming features -- `test_enhanced_utilities_simple.rawk`: Tests for enhanced array utilities - -### Test Data (`tests/data/`) -- `test_data.txt`: Simulated system command outputs -- `test_logs.txt`: Sample Apache and syslog entries -- `test_employees.csv`: Sample employee data -- `test_input.txt`: Simple input data for mixed tests - -Run tests with: ```bash -# Run the comprehensive test suite -awk -f rawk.awk tests/core/test_suite.rawk | awk -f - - -# Run real-world examples -awk -f rawk.awk tests/real_world/test_system_monitor.rawk | awk -f - tests/data/test_data.txt -awk -f rawk.awk tests/real_world/test_log_parser.rawk | awk -f - tests/data/test_logs.txt -awk -f rawk.awk tests/real_world/test_csv_processor.rawk | awk -f - tests/data/test_employees.csv - -# Run individual core tests -awk -f rawk.awk tests/core/test_basic.rawk | awk -f - +cd tests && ./test_runner.sh ``` -### Writing Tests - -rawk includes a built-in testing framework with assertion functions: +## Requirements -```rawk -$add = (x, y) -> x + y; - -BEGIN { - # Test basic functionality - result = add(2, 3) - expect_equal(result, 5, "add(2, 3) should return 5") - - # Test edge cases - result = add(0, 0) - expect_equal(result, 0, "add(0, 0) should return 0") - - # Test boolean conditions - expect_true(add(2, 2) == 4, "2 + 2 should equal 4") - expect_false(add(2, 2) == 5, "2 + 2 should not equal 5") - - print "All tests passed!" -} -``` - -## Compilation Process - -1. **Parse**: rawk function definitions are parsed using `split` on the `->` symbol -2. **Generate**: Internal awk functions are generated with unique names (`__lambda_0`, `__lambda_1`, etc.) -3. **Dispatch**: A dispatch table maps public function names to internal names -4. **Replace**: Function calls are replaced with internal names during compilation -5. **Output**: Standard library functions are prepended to the final awk script - -## Limitations - -- **Function Names**: Must be valid awk identifiers -- **Array Returns**: Functions cannot return arrays (use pass-by-reference instead) -- **Array Order**: AWK doesn't guarantee array iteration order -- **Dynamic Dispatch**: Limited to functions defined at compile time - -## Error Handling - -The compiler provides helpful error messages for: -- Invalid function definition syntax -- Missing `->` symbols -- Malformed argument lists -- Unexpected function definitions in multi-line bodies - -## Portability - -- **Target**: Standard awk (nawk, BSD awk) -- **Avoids**: gawk-specific features -- **Uses**: Only standard awk constructs and functions -- **Compatibility**: Works on any POSIX-compliant system - -## Contributing - -1. Add test cases for new features -2. Ensure compatibility with standard awk -3. Update documentation for new functionality -4. Test on multiple awk implementations +- Any awk implementation (gawk, mawk, nawk, etc.) +- No additional dependencies, strives to work with any POSIX awk ## License -This project is open source. Feel free to use, modify, and distribute as needed. - -## Acknowledgments - -Inspired by the need for a more expressive syntax for awk programming while maintaining the portability and simplicity that makes awk so powerful. \ No newline at end of file +Public Domain \ No newline at end of file |