diff options
Diffstat (limited to 'awk/rawk/scratch/tests_old/data/README.md')
-rw-r--r-- | awk/rawk/scratch/tests_old/data/README.md | 139 |
1 files changed, 139 insertions, 0 deletions
diff --git a/awk/rawk/scratch/tests_old/data/README.md b/awk/rawk/scratch/tests_old/data/README.md new file mode 100644 index 0000000..cb8f23b --- /dev/null +++ b/awk/rawk/scratch/tests_old/data/README.md @@ -0,0 +1,139 @@ +# Test Data Files + +This directory contains sample data files used by the real-world examples. + +## Data Files + +### `test_data.txt` - System Command Outputs +Simulated output from common system commands: + +**df output:** +``` +Filesystem 1K-blocks Used Available Use% Mounted on +/dev/sda1 1048576 524288 524288 50 / +/dev/sdb1 2097152 1887436 209716 90 /home +/dev/sdc1 524288 104857 419431 20 /var +/dev/sdd1 1048576 943718 104858 90 /tmp +``` + +**ps output:** +``` +PID USER %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +1234 user1 15.2 2.1 1234567 12345 pts/0 S 10:30 0:15 chrome +5678 user2 0.5 8.3 2345678 23456 pts/1 S 09:15 1:30 firefox +9012 user1 2.1 1.5 3456789 34567 pts/2 S 11:45 0:05 bash +3456 user3 25.7 1.2 4567890 45678 pts/3 R 12:00 0:30 stress +7890 user2 0.1 12.5 5678901 56789 pts/4 S 08:30 2:15 docker +``` + +**ls -l output:** +``` +total 1234 +-rw-r--r-- 1 user1 group1 1024 Jan 15 10:30 file1.txt +drwxr-xr-x 2 user2 group2 4096 Jan 15 11:45 directory1 +-rwxr-xr-x 1 user1 group1 2048 Jan 15 12:00 executable.sh +-rw-r--r-- 1 user3 group1 512 Jan 15 12:15 config.json +-rw-r--r-- 1 user1 group2 3072 Jan 15 12:30 large_file.dat +``` + +**Used by:** `../real_world/test_system_monitor.rawk` + +### `test_logs.txt` - Log Entries +Sample log entries in common formats: + +**Apache log entries:** +``` +192.168.1.100 - - [15/Jan/2024:10:30:15 +0000] "GET /index.html HTTP/1.1" 200 1024 +192.168.1.101 - - [15/Jan/2024:10:30:16 +0000] "GET /style.css HTTP/1.1" 200 512 +192.168.1.102 - - [15/Jan/2024:10:30:17 +0000] "POST /login HTTP/1.1" 302 0 +192.168.1.103 - - [15/Jan/2024:10:30:18 +0000] "GET /image.jpg HTTP/1.1" 200 2048 +192.168.1.104 - - [15/Jan/2024:10:30:19 +0000] "GET /nonexistent.html HTTP/1.1" 404 0 +192.168.1.105 - - [15/Jan/2024:10:30:20 +0000] "GET /script.js HTTP/1.1" 200 768 +192.168.1.106 - - [15/Jan/2024:10:30:21 +0000] "POST /submit HTTP/1.1" 500 0 +``` + +**Syslog entries:** +``` +Jan 15 10:30:15 server1 sshd: Accepted password for user1 from 192.168.1.100 +Jan 15 10:30:16 server1 kernel: ERROR: Out of memory +Jan 15 10:30:17 server1 apache2: WARNING: Server reached MaxClients +Jan 15 10:30:18 server1 cron: INFO: Daily backup completed +Jan 15 10:30:19 server1 sshd: ERROR: Failed password for user2 from 192.168.1.101 +Jan 15 10:30:20 server1 systemd: INFO: Started network service +``` + +**Used by:** `../real_world/test_log_parser.rawk` + +### `test_employees.csv` - Employee Data +Sample CSV file with employee information: + +``` +Name,Email,Age,Salary,Department +John Smith,john.smith@company.com,32,65000,Engineering +Jane Doe,jane.doe@company.com,28,72000,Marketing +Bob Johnson,bob.johnson@company.com,45,85000,Sales +Alice Brown,alice.brown@company.com,22,55000,Engineering +Charlie Wilson,charlie.wilson@company.com,38,78000,Finance +Diana Davis,diana.davis@company.com,29,68000,Marketing +Eve Miller,eve.miller@company.com,52,92000,Management +Frank Garcia,frank.garcia@company.com,25,60000,Engineering +Grace Lee,grace.lee@company.com,41,82000,Sales +Henry Taylor,henry.taylor@company.com,35,75000,Finance +Ivy Chen,ivy.chen@company.com,27,67000,Engineering +Jack Anderson,jack.anderson@company.com,48,88000,Management +``` + +**Features:** +- 12 employees across 4 departments +- Mix of valid email addresses +- Age range from 22 to 52 +- Salary range from $55,000 to $92,000 +- Various data quality scenarios + +**Used by:** `../real_world/test_csv_processor.rawk` + +### `test_input.txt` - Simple Input Data +Simple text input for basic processing: + +``` +Hello +This is a short line +This is a much longer line that should be detected +``` + +**Used by:** `../real_world/test_mixed.rawk` + +## Data Characteristics + +### System Data (`test_data.txt`) +- **Disk usage**: Mix of normal (20-50%) and critical (90%) usage +- **Process data**: Various CPU and memory usage patterns +- **File data**: Mix of files, directories, and executables + +### Log Data (`test_logs.txt`) +- **Apache logs**: Mix of successful (200), redirect (302), and error (404, 500) responses +- **Syslog entries**: Mix of INFO, WARNING, and ERROR messages +- **Realistic patterns**: Common log entry formats and content + +### Employee Data (`test_employees.csv`) +- **Valid data**: All emails are properly formatted +- **Age distribution**: Spread across different age groups +- **Salary variation**: Realistic salary ranges by department +- **Department balance**: Multiple employees per department + +## Usage + +These data files are designed to test various scenarios: + +1. **Normal operation**: Most data represents typical, valid cases +2. **Edge cases**: Some data includes boundary conditions (90% disk usage, high CPU processes) +3. **Error conditions**: Log files include error responses and system issues +4. **Data validation**: CSV includes various data types for validation testing + +## Customization + +You can modify these files to test different scenarios: +- Add more system data for different monitoring scenarios +- Include different log formats for additional parsing tests +- Modify CSV data to test different validation rules +- Create new data files for specific use cases \ No newline at end of file |