SRA Example: Any Stack, Any Network, Anybody
In this blog I will share a sample implementation of the technology stack described in the ThetaPoint Security Reference Architecture for Security Operations Centers. I will walk through some basics of network and endpoint instrumentation, event transport, analytics, and workflow. This example is designed to be easy to follow, but if you have something more specific that you want to delve into, feel free to book time on my calendar, and we’ll explore in detail together.
Architectural Overview – Instrumentation
For this implementation, we’ll be using the Elastic Beats for wiring up event collection on our endpoints. Beats will collect the following events for us:
Architectural Overview – Ingest
We will use Logstash to aggregate all the Beats-harvested events into a unified stream. Per Elastic’s recommendations, Logstash will group these streams according to the type and minor release version of the harvesting agent. We can use a cluster of Logstash receivers to both spread the load and provide resilient service in the event of an outage. Specifically, we can set up a clustered listener for each type of beat record and pass labeling through for subsequent processing.
Architectural Overview – Utility
Logstash also performs some in-stream processing to modify, decorate, and filter individual records according to our needs. You can use this capability to trade a little CPU time and latency for storage and compute load on your back-end data lake or SIEM.
Once our stream processing has completed, we can fork several copies of each record to one or more destination systems. In this example, we have a destination for several Apache Kafka topics, ElasticSearch indices, and flat log archives.
Sending records to several Kafka topics allows us to perform some platform-level diagnostics and ensure event durability (up to our topic retention period) for critical data. If we had a generous tolerance for end-to-end latency, we could split record processing even further by forcing writes to a Kafka topic first, then using separate consumer groups to send along to each end system (e.g. ElasticSearch, Splunk, Hadoop, other SIEM, etc.). That would ensure event durability first while decoupling stream processing workload from end-system delivery.
Architectural Overview – Workflow
We will use a 2-phase workflow process to generate an alert for building an investigation. We use Logstash again to filter our records of interest directly from the Kafka topics and build an alert document for use in Kibana’s Timeline.
Once there, we use the newly-released Elastic SIEM to browse our alert records to provide context when investigating specific events. The Elastic SIEM has basic functionality for selecting and annotating the events that comprise an overall investigation record. Its functionality currently limits you to a workflow that centers around building predicates for grouping events together and adding notes for narrative context.
Our basic workflow involves picking alerts out of the alert index that do not belong to a documented Timeline, building relevant context through search conditions and notes, then activating a related procedure once a disposition has been reached.
Architectural Overview – Model
Here, I cheat a bit. I leverage zeek’s ability to tag connections as locally or remotely initiated. The networks.cfg file provides the base information on which subnets “belong” in your network. Normally, you would want to generate this configuration automatically by aggregating route tables, but since the lab network is small, we can specify it manually. I also cheat by using the Beats user data to add entity-level context at the case, but eventually I will show how to push that model data further up the pipeline and added automatically on each event.
An Unsung Hero – the Elastic Common Schema
While the 7.2 release did bring with it a basic front-end that analysts can use to support investigative workflows, the real star in the 7.x release has been the Elastic Common Schema. This open specification defines a comprehensive taxonomyin which vendors and content authors can organize various pieces of digital evidence in an event record. Yes,it has been done before (CEF, STIX, IODEF, etcetc.), but the ECS can be used to quickly create easily-ingestible log formats and managed configuration.
For example, we have implemented an ECS-compatible logging format for the squid proxy that generates JSON-formatted logs using the ECS schema field taxonomy:
Elastic does not officially support this particular implementation, but it is compatible enough to allow drill-and-pivot from officially-supported record types into the squid logs for building Timelines.
We raise the bar by connecting on-prem events with a cloud-top SIEM and enriched functionality afforded to us by workflow orchestration and automation.
- The SRA framework can apply to any tech stack – vendor-supplied, open source, or even proprietary
- Building on some basic abstractions you gain data portability and implementation flexibility, even for high-contact surfaces like instrumentation and workflow
- Getting your organization started on the journey to real-time situation awareness does not require a massive up-front investment – you can start on the basic building blocks and keep adding layers with confidence as your maturity increases and your workload scales