diff --git a/README.md b/README.md index 958e07f..d500ff6 100644 --- a/README.md +++ b/README.md @@ -137,6 +137,8 @@ Formula-like CSV text fields are neutralized with a leading single quote so spre When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. Markdown table fields escape table separators, line breaks, and HTML-sensitive characters so unusual log tokens cannot break report layout. +For the report artifact contract and golden fixture map, see [`docs/report-artifacts.md`](./docs/report-artifacts.md). + ## Sample Output For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). diff --git a/docs/report-artifacts.md b/docs/report-artifacts.md new file mode 100644 index 0000000..590b25e --- /dev/null +++ b/docs/report-artifacts.md @@ -0,0 +1,72 @@ +# Report Artifacts + +LogLens writes deterministic offline artifacts for reviewer inspection and downstream tooling. + +## Artifact Set + +| Artifact | When written | Review purpose | +| --- | --- | --- | +| `report.md` | Every successful run | Human-readable triage report with summary, findings, event counts, parser quality, and parser warnings | +| `report.json` | Every successful run | Machine-readable report with the same core evidence and parser telemetry | +| `findings.csv` | Only when `--csv` is set | Spreadsheet-friendly finding rows | +| `warnings.csv` | Only when `--csv` is set | Spreadsheet-friendly parser warning rows | + +Without `--csv`, LogLens does not create, overwrite, or delete existing CSV files in the output directory. + +## JSON Contract + +The JSON report keeps parser observability visible next to findings: + +- `tool` +- `input` +- `input_mode` +- `assume_year` for syslog-style input when a year is supplied +- `timezone_present` +- `parser_quality.total_input_lines` +- `parser_quality.total_lines` +- `parser_quality.skipped_blank_lines` +- `parser_quality.parsed_lines` +- `parser_quality.unparsed_lines` +- `parser_quality.parse_success_rate` +- `parser_quality.top_unknown_patterns` +- `parsed_event_count` +- `warning_count` +- `finding_count` +- `event_counts` +- `host_summaries` when more than one hostname is represented +- `findings` +- `warnings` + +Finding objects contain `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, and `summary`. + +Warning objects contain the original `line_number` and the parser `reason`. + +## CSV Contract + +The optional CSV exports intentionally stay small: + +- `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary` +- `warnings.csv`: `kind`, `line_number`, `message` + +Formula-like CSV text fields are neutralized with a leading single quote so spreadsheet tools treat them as text. + +## Markdown Safety + +Markdown table fields escape table separators, line breaks, HTML-sensitive characters, and control characters. Unusual log tokens should not be able to break report layout. + +## Golden Fixtures + +The report contracts are backed by generated fixture artifacts: + +| Fixture case | Golden artifacts | +| --- | --- | +| [`syslog_legacy`](../tests/fixtures/report_contracts/syslog_legacy) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` | +| [`journalctl_short_full`](../tests/fixtures/report_contracts/journalctl_short_full) | `report.md`, `report.json` | +| [`multi_host_syslog_legacy`](../tests/fixtures/report_contracts/multi_host_syslog_legacy) | `report.md`, `report.json`, `findings.csv`, `warnings.csv` | +| [`multi_host_journalctl_short_full`](../tests/fixtures/report_contracts/multi_host_journalctl_short_full) | `report.md`, `report.json` | + +The enforcement lives in [`tests/test_report_contracts.cpp`](../tests/test_report_contracts.cpp). The focused report writer tests live in [`tests/test_report.cpp`](../tests/test_report.cpp). + +## Boundaries + +Reports are triage aids. They are not SIEM evidence, incident verdicts, attribution claims, or cross-host correlation output. Host summaries are compact per-host rollups; they do not change detector thresholds. diff --git a/docs/reviewer-path.md b/docs/reviewer-path.md index cd04702..2e20590 100644 --- a/docs/reviewer-path.md +++ b/docs/reviewer-path.md @@ -28,6 +28,7 @@ Inspect: - [`assets/sample_journalctl_short_full.log`](../assets/sample_journalctl_short_full.log) - [`tests/fixtures/report_contracts/syslog_legacy/report.md`](../tests/fixtures/report_contracts/syslog_legacy/report.md) - [`tests/fixtures/report_contracts/syslog_legacy/report.json`](../tests/fixtures/report_contracts/syslog_legacy/report.json) +- [`docs/report-artifacts.md`](./report-artifacts.md) - [`docs/parser-contract.md`](./parser-contract.md) Look for parser coverage fields: diff --git a/tests/fixtures/report_contracts/journalctl_short_full/report.json b/tests/fixtures/report_contracts/journalctl_short_full/report.json index abffa2e..ccffff9 100644 --- a/tests/fixtures/report_contracts/journalctl_short_full/report.json +++ b/tests/fixtures/report_contracts/journalctl_short_full/report.json @@ -4,7 +4,9 @@ "input_mode": "journalctl_short_full", "timezone_present": true, "parser_quality": { + "total_input_lines": 16, "total_lines": 16, + "skipped_blank_lines": 0, "parsed_lines": 14, "unparsed_lines": 2, "parse_success_rate": 0.8750, diff --git a/tests/fixtures/report_contracts/journalctl_short_full/report.md b/tests/fixtures/report_contracts/journalctl_short_full/report.md index a122bd9..2c94fc5 100644 --- a/tests/fixtures/report_contracts/journalctl_short_full/report.md +++ b/tests/fixtures/report_contracts/journalctl_short_full/report.md @@ -5,7 +5,9 @@ - Input: `tests/fixtures/report_contracts/journalctl_short_full/input.log` - Input mode: journalctl_short_full - Timezone present: true +- Total input lines: 16 - Total lines: 16 +- Skipped blank lines: 0 - Parsed lines: 14 - Unparsed lines: 2 - Parse success rate: 87.50% diff --git a/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.json b/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.json index a71b07d..a633a3a 100644 --- a/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.json +++ b/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.json @@ -4,7 +4,9 @@ "input_mode": "journalctl_short_full", "timezone_present": true, "parser_quality": { + "total_input_lines": 15, "total_lines": 15, + "skipped_blank_lines": 0, "parsed_lines": 12, "unparsed_lines": 3, "parse_success_rate": 0.8000, diff --git a/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.md b/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.md index b7a4d93..8af3a79 100644 --- a/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.md +++ b/tests/fixtures/report_contracts/multi_host_journalctl_short_full/report.md @@ -5,7 +5,9 @@ - Input: `tests/fixtures/report_contracts/multi_host_journalctl_short_full/input.log` - Input mode: journalctl_short_full - Timezone present: true +- Total input lines: 15 - Total lines: 15 +- Skipped blank lines: 0 - Parsed lines: 12 - Unparsed lines: 3 - Parse success rate: 80.00% diff --git a/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.json b/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.json index c8d2ec6..ca172a1 100644 --- a/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.json +++ b/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.json @@ -5,7 +5,9 @@ "assume_year": 2026, "timezone_present": false, "parser_quality": { + "total_input_lines": 15, "total_lines": 15, + "skipped_blank_lines": 0, "parsed_lines": 12, "unparsed_lines": 3, "parse_success_rate": 0.8000, diff --git a/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.md b/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.md index 7959f2b..b9421d9 100644 --- a/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.md +++ b/tests/fixtures/report_contracts/multi_host_syslog_legacy/report.md @@ -6,7 +6,9 @@ - Input mode: syslog_legacy - Assume year: 2026 - Timezone present: false +- Total input lines: 15 - Total lines: 15 +- Skipped blank lines: 0 - Parsed lines: 12 - Unparsed lines: 3 - Parse success rate: 80.00% diff --git a/tests/fixtures/report_contracts/syslog_legacy/report.json b/tests/fixtures/report_contracts/syslog_legacy/report.json index c80f7c4..222891c 100644 --- a/tests/fixtures/report_contracts/syslog_legacy/report.json +++ b/tests/fixtures/report_contracts/syslog_legacy/report.json @@ -5,7 +5,9 @@ "assume_year": 2026, "timezone_present": false, "parser_quality": { + "total_input_lines": 16, "total_lines": 16, + "skipped_blank_lines": 0, "parsed_lines": 14, "unparsed_lines": 2, "parse_success_rate": 0.8750, diff --git a/tests/fixtures/report_contracts/syslog_legacy/report.md b/tests/fixtures/report_contracts/syslog_legacy/report.md index b26d176..e6b7410 100644 --- a/tests/fixtures/report_contracts/syslog_legacy/report.md +++ b/tests/fixtures/report_contracts/syslog_legacy/report.md @@ -6,7 +6,9 @@ - Input mode: syslog_legacy - Assume year: 2026 - Timezone present: false +- Total input lines: 16 - Total lines: 16 +- Skipped blank lines: 0 - Parsed lines: 14 - Unparsed lines: 2 - Parse success rate: 87.50% diff --git a/tests/test_report_contracts.cpp b/tests/test_report_contracts.cpp index c1d5476..bb5406d 100644 --- a/tests/test_report_contracts.cpp +++ b/tests/test_report_contracts.cpp @@ -108,7 +108,9 @@ std::vector extract_markdown_contract_lines(const std::string& mark || starts_with(line, "- Input mode: ") || starts_with(line, "- Assume year: ") || starts_with(line, "- Timezone present: ") + || starts_with(line, "- Total input lines: ") || starts_with(line, "- Total lines: ") + || starts_with(line, "- Skipped blank lines: ") || starts_with(line, "- Parsed lines: ") || starts_with(line, "- Unparsed lines: ") || starts_with(line, "- Parse success rate: ") @@ -140,7 +142,9 @@ std::vector extract_json_contract_lines(const std::string& json) { || starts_with(line, "\"input_mode\": ") || starts_with(line, "\"assume_year\": ") || starts_with(line, "\"timezone_present\": ") + || starts_with(line, "\"total_input_lines\": ") || starts_with(line, "\"total_lines\": ") + || starts_with(line, "\"skipped_blank_lines\": ") || starts_with(line, "\"parsed_lines\": ") || starts_with(line, "\"unparsed_lines\": ") || starts_with(line, "\"parse_success_rate\": ")