about summary refs log tree commit diff stats
path: root/awk/rawk/scratch/tests_old/data/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'awk/rawk/scratch/tests_old/data/README.md')
-rw-r--r--awk/rawk/scratch/tests_old/data/README.md139
1 files changed, 139 insertions, 0 deletions
diff --git a/awk/rawk/scratch/tests_old/data/README.md b/awk/rawk/scratch/tests_old/data/README.md
new file mode 100644
index 0000000..cb8f23b
--- /dev/null
+++ b/awk/rawk/scratch/tests_old/data/README.md
@@ -0,0 +1,139 @@
+# Test Data Files
+
+This directory contains sample data files used by the real-world examples.
+
+## Data Files
+
+### `test_data.txt` - System Command Outputs
+Simulated output from common system commands:
+
+**df output:**
+```
+Filesystem	1K-blocks	Used	Available	Use%	Mounted on
+/dev/sda1	1048576	524288	524288	50	/
+/dev/sdb1	2097152	1887436	209716	90	/home
+/dev/sdc1	524288	104857	419431	20	/var
+/dev/sdd1	1048576	943718	104858	90	/tmp
+```
+
+**ps output:**
+```
+PID	USER	%CPU	%MEM	VSZ	RSS	TTY	STAT	START	TIME	COMMAND
+1234	user1	15.2	2.1	1234567	12345	pts/0	S	10:30	0:15	chrome
+5678	user2	0.5	8.3	2345678	23456	pts/1	S	09:15	1:30	firefox
+9012	user1	2.1	1.5	3456789	34567	pts/2	S	11:45	0:05	bash
+3456	user3	25.7	1.2	4567890	45678	pts/3	R	12:00	0:30	stress
+7890	user2	0.1	12.5	5678901	56789	pts/4	S	08:30	2:15	docker
+```
+
+**ls -l output:**
+```
+total 1234
+-rw-r--r--	1	user1	group1	1024	Jan 15 10:30	file1.txt
+drwxr-xr-x	2	user2	group2	4096	Jan 15 11:45	directory1
+-rwxr-xr-x	1	user1	group1	2048	Jan 15 12:00	executable.sh
+-rw-r--r--	1	user3	group1	512	Jan 15 12:15	config.json
+-rw-r--r--	1	user1	group2	3072	Jan 15 12:30	large_file.dat
+```
+
+**Used by:** `../real_world/test_system_monitor.rawk`
+
+### `test_logs.txt` - Log Entries
+Sample log entries in common formats:
+
+**Apache log entries:**
+```
+192.168.1.100 - - [15/Jan/2024:10:30:15 +0000] "GET /index.html HTTP/1.1" 200 1024
+192.168.1.101 - - [15/Jan/2024:10:30:16 +0000] "GET /style.css HTTP/1.1" 200 512
+192.168.1.102 - - [15/Jan/2024:10:30:17 +0000] "POST /login HTTP/1.1" 302 0
+192.168.1.103 - - [15/Jan/2024:10:30:18 +0000] "GET /image.jpg HTTP/1.1" 200 2048
+192.168.1.104 - - [15/Jan/2024:10:30:19 +0000] "GET /nonexistent.html HTTP/1.1" 404 0
+192.168.1.105 - - [15/Jan/2024:10:30:20 +0000] "GET /script.js HTTP/1.1" 200 768
+192.168.1.106 - - [15/Jan/2024:10:30:21 +0000] "POST /submit HTTP/1.1" 500 0
+```
+
+**Syslog entries:**
+```
+Jan 15 10:30:15 server1 sshd: Accepted password for user1 from 192.168.1.100
+Jan 15 10:30:16 server1 kernel: ERROR: Out of memory
+Jan 15 10:30:17 server1 apache2: WARNING: Server reached MaxClients
+Jan 15 10:30:18 server1 cron: INFO: Daily backup completed
+Jan 15 10:30:19 server1 sshd: ERROR: Failed password for user2 from 192.168.1.101
+Jan 15 10:30:20 server1 systemd: INFO: Started network service
+```
+
+**Used by:** `../real_world/test_log_parser.rawk`
+
+### `test_employees.csv` - Employee Data
+Sample CSV file with employee information:
+
+```
+Name,Email,Age,Salary,Department
+John Smith,john.smith@company.com,32,65000,Engineering
+Jane Doe,jane.doe@company.com,28,72000,Marketing
+Bob Johnson,bob.johnson@company.com,45,85000,Sales
+Alice Brown,alice.brown@company.com,22,55000,Engineering
+Charlie Wilson,charlie.wilson@company.com,38,78000,Finance
+Diana Davis,diana.davis@company.com,29,68000,Marketing
+Eve Miller,eve.miller@company.com,52,92000,Management
+Frank Garcia,frank.garcia@company.com,25,60000,Engineering
+Grace Lee,grace.lee@company.com,41,82000,Sales
+Henry Taylor,henry.taylor@company.com,35,75000,Finance
+Ivy Chen,ivy.chen@company.com,27,67000,Engineering
+Jack Anderson,jack.anderson@company.com,48,88000,Management
+```
+
+**Features:**
+- 12 employees across 4 departments
+- Mix of valid email addresses
+- Age range from 22 to 52
+- Salary range from $55,000 to $92,000
+- Various data quality scenarios
+
+**Used by:** `../real_world/test_csv_processor.rawk`
+
+### `test_input.txt` - Simple Input Data
+Simple text input for basic processing:
+
+```
+Hello
+This is a short line
+This is a much longer line that should be detected
+```
+
+**Used by:** `../real_world/test_mixed.rawk`
+
+## Data Characteristics
+
+### System Data (`test_data.txt`)
+- **Disk usage**: Mix of normal (20-50%) and critical (90%) usage
+- **Process data**: Various CPU and memory usage patterns
+- **File data**: Mix of files, directories, and executables
+
+### Log Data (`test_logs.txt`)
+- **Apache logs**: Mix of successful (200), redirect (302), and error (404, 500) responses
+- **Syslog entries**: Mix of INFO, WARNING, and ERROR messages
+- **Realistic patterns**: Common log entry formats and content
+
+### Employee Data (`test_employees.csv`)
+- **Valid data**: All emails are properly formatted
+- **Age distribution**: Spread across different age groups
+- **Salary variation**: Realistic salary ranges by department
+- **Department balance**: Multiple employees per department
+
+## Usage
+
+These data files are designed to test various scenarios:
+
+1. **Normal operation**: Most data represents typical, valid cases
+2. **Edge cases**: Some data includes boundary conditions (90% disk usage, high CPU processes)
+3. **Error conditions**: Log files include error responses and system issues
+4. **Data validation**: CSV includes various data types for validation testing
+
+## Customization
+
+You can modify these files to test different scenarios:
+- Add more system data for different monitoring scenarios
+- Include different log formats for additional parsing tests
+- Modify CSV data to test different validation rules
+- Create new data files for specific use cases 
\ No newline at end of file