# Test Data Files This directory contains sample data files used by the real-world examples. ## Data Files ### `test_data.txt` - System Command Outputs Simulated output from common system commands: **df output:** ``` Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 1048576 524288 524288 50 / /dev/sdb1 2097152 1887436 209716 90 /home /dev/sdc1 524288 104857 419431 20 /var /dev/sdd1 1048576 943718 104858 90 /tmp ``` **ps output:** ``` PID USER %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 1234 user1 15.2 2.1 1234567 12345 pts/0 S 10:30 0:15 chrome 5678 user2 0.5 8.3 2345678 23456 pts/1 S 09:15 1:30 firefox 9012 user1 2.1 1.5 3456789 34567 pts/2 S 11:45 0:05 bash 3456 user3 25.7 1.2 4567890 45678 pts/3 R 12:00 0:30 stress 7890 user2 0.1 12.5 5678901 56789 pts/4 S 08:30 2:15 docker ``` **ls -l output:** ``` total 1234 -rw-r--r-- 1 user1 group1 1024 Jan 15 10:30 file1.txt drwxr-xr-x 2 user2 group2 4096 Jan 15 11:45 directory1 -rwxr-xr-x 1 user1 group1 2048 Jan 15 12:00 executable.sh -rw-r--r-- 1 user3 group1 512 Jan 15 12:15 config.json -rw-r--r-- 1 user1 group2 3072 Jan 15 12:30 large_file.dat ``` **Used by:** `../real_world/test_system_monitor.rawk` ### `test_logs.txt` - Log Entries Sample log entries in common formats: **Apache log entries:** ``` 192.168.1.100 - - [15/Jan/2024:10:30:15 +0000] "GET /index.html HTTP/1.1" 200 1024 192.168.1.101 - - [15/Jan/2024:10:30:16 +0000] "GET /style.css HTTP/1.1" 200 512 192.168.1.102 - - [15/Jan/2024:10:30:17 +0000] "POST /login HTTP/1.1" 302 0 192.168.1.103 - - [15/Jan/2024:10:30:18 +0000] "GET /image.jpg HTTP/1.1" 200 2048 192.168.1.104 - - [15/Jan/2024:10:30:19 +0000] "GET /nonexistent.html HTTP/1.1" 404 0 192.168.1.105 - - [15/Jan/2024:10:30:20 +0000] "GET /script.js HTTP/1.1" 200 768 192.168.1.106 - - [15/Jan/2024:10:30:21 +0000] "POST /submit HTTP/1.1" 500 0 ``` **Syslog entries:** ``` Jan 15 10:30:15 server1 sshd: Accepted password for user1 from 192.168.1.100 Jan 15 10:30:16 server1 kernel: ERROR: Out of memory Jan 15 10:30:17 server1 apache2: WARNING: Server reached MaxClients Jan 15 10:30:18 server1 cron: INFO: Daily backup completed Jan 15 10:30:19 server1 sshd: ERROR: Failed password for user2 from 192.168.1.101 Jan 15 10:30:20 server1 systemd: INFO: Started network service ``` **Used by:** `../real_world/test_log_parser.rawk` ### `test_employees.csv` - Employee Data Sample CSV file with employee information: ``` Name,Email,Age,Salary,Department John Smith,john.smith@company.com,32,65000,Engineering Jane Doe,jane.doe@company.com,28,72000,Marketing Bob Johnson,bob.johnson@company.com,45,85000,Sales Alice Brown,alice.brown@company.com,22,55000,Engineering Charlie Wilson,charlie.wilson@company.com,38,78000,Finance Diana Davis,diana.davis@company.com,29,68000,Marketing Eve Miller,eve.miller@company.com,52,92000,Management Frank Garcia,frank.garcia@company.com,25,60000,Engineering Grace Lee,grace.lee@company.com,41,82000,Sales Henry Taylor,henry.taylor@company.com,35,75000,Finance Ivy Chen,ivy.chen@company.com,27,67000,Engineering Jack Anderson,jack.anderson@company.com,48,88000,Management ``` **Features:** - 12 employees across 4 departments - Mix of valid email addresses - Age range from 22 to 52 - Salary range from $55,000 to $92,000 - Various data quality scenarios **Used by:** `../real_world/test_csv_processor.rawk` ### `test_input.txt` - Simple Input Data Simple text input for basic processing: ``` Hello This is a short line This is a much longer line that should be detected ``` **Used by:** `../real_world/test_mixed.rawk` ## Data Characteristics ### System Data (`test_data.txt`) - **Disk usage**: Mix of normal (20-50%) and critical (90%) usage - **Process data**: Various CPU and memory usage patterns - **File data**: Mix of files, directories, and executables ### Log Data (`test_logs.txt`) - **Apache logs**: Mix of successful (200), redirect (302), and error (404, 500) responses - **Syslog entries**: Mix of INFO, WARNING, and ERROR messages - **Realistic patterns**: Common log entry formats and content ### Employee Data (`test_employees.csv`) - **Valid data**: All emails are properly formatted - **Age distribution**: Spread across different age groups - **Salary variation**: Realistic salary ranges by department - **Department balance**: Multiple employees per department ## Usage These data files are designed to test various scenarios: 1. **Normal operation**: Most data represents typical, valid cases 2. **Edge cases**: Some data includes boundary conditions (90% disk usage, high CPU processes) 3. **Error conditions**: Log files include error responses and system issues 4. **Data validation**: CSV includes various data types for validation testing ## Customization You can modify these files to test different scenarios: - Add more system data for different monitoring scenarios - Include different log formats for additional parsing tests - Modify CSV data to test different validation rules - Create new data files for specific use cases