Introducing EvidenceForge: Synthetic security logs that don’t look (as) fake

Source: Cisco Talos Blog

Author: David J. Bianco

URL: https://blog.talosintelligence.com/introducing-evidenceforge-synthetic-security-logs-that-dont-look-as-fake/

https://blog.talosintelligence.com/introducing-evidenceforge-synthetic-security-logs-that-dont-look-as-fake/

ONE SENTENCE SUMMARY:

EvidenceForge generates realistic, causally consistent, multi-format synthetic security logs with ground truth, enabling training, detection validation, and scalable analytics development.

MAIN POINTS:

  1. High-quality labeled datasets are essential for training responders, validating detections, and building models.
  2. Production telemetry raises compliance issues, while public datasets are anonymized, stale, and over-reused.
  3. Self-generated attack simulations require real infrastructure, time, and scale poorly for scenario variety.
  4. Many synthetic generators emit independent events, breaking cross-source coherence and causal storytelling.
  5. EvidenceForge uses a canonical SecurityEvent model to synchronize fields across all emitters.
  6. Shared contexts enforce consistency for PIDs, LogonIDs, timestamps, and network identifiers like Zeek UIDs.
  7. Scenario YAML defines hosts, users, topology, and optional attack storylines for deterministic generation.
  8. Engine outputs 20+ correlated formats spanning Windows, Linux, network, and EDR telemetry.
  9. Rule engine inserts prerequisite protocol events with realistic timing for causal correctness.
  10. Background noise, red herrings, and bursty timing models improve realism and analyst training value.

TAKEAWAYS:

  1. Canonical event modeling solves the “logs don’t line up” problem across heterogeneous telemetry sources.
  2. Deterministic generation with seeded randomness enables repeatable datasets for regression testing detections.
  3. Sensor-placement modeling produces realistic network visibility gaps, mirroring real monitoring limitations.
  4. AI-assisted scenario authoring reduces expertise burden while scripts guarantee field-level consistency at scale.
  5. Companion ENVIRONMENT and GROUND_TRUTH documents provide analyst context and verifiable labels for evaluation.