Engineering
– 17 min read
Cerebro: An open source agentic system for security alert triage
- Security teams face overwhelming alert volumes from cloud monitoring tools, with Writer experiencing a 2% actionable signal rate among thousands of daily findings.
- Cerebro is an open-source AI agent system that automates security alert triage by normalizing findings, building infrastructure graphs, and enriching alerts with context from multiple tools.
- The system deduplicates related findings into single incidents and uses AI agents to make evidence-based triage decisions: resolve immediately, create tickets, or defer based on real risk.
- Built with enterprise guardrails including human-in-the-loop thresholds, deterministic validation, and full traceability to ensure safe, auditable automated security operations.
Over the last year or two, many security teams have pushed to adopt “modern” modern cloud security tooling. We wanted more visibility and granularity into what was happening, to see around corners before a problem got out of hand. Unfortunately, we got what we asked for: a firehose of alerts, warnings, and findings that never stops.
Here at WRITER, our initial experience was thousands of alerts among which, after triage, we found perhaps 100 were truly actionable. Over the following weeks we calculated a roughly 2% signal rate. The other 98% was noise that consumed analyst hours, team resources, and generally left a bad taste in our mouths.
I’ll pause and emphasize that this isn’t a “tools are bad” story. It’s a “tools are doing their job, and humans can’t scale their attention today’s reality” story.
WRITER is an AI company, so naturally we looked to AI to find a fix. Over the last few months we built Cerebro — an open-source, vendor-agnostic system that uses AI agents to do what security engineers spend most of their time doing manually.
Cerebro gathers context across fragmented systems, connects the findings to real infrastructure topology, deduplicates related problems into single incidents and makes the triage call with evidence, not vibes.
The output is straightforward . We have Cerebro classify its response into three categories, the last of which has three modes.
✅ **Resolve now**
🧾 **Ticket** (with priority and owner)
⏳ **Defer / accept / monitor**
Although we love to label things, the goal here is not a layered taxonomy. The reality is security teams need to make quick decisions. We need to react immediately and with precision to the alerts that matter in the moment. A tsunami of alerts distracts from this mission.
The graph security promise (and why it still fails)
If you’ve run a cloud security program in the last three years, you’ve seen the pitch: don’t treat issues as isolated alerts. Model the cloud as a connected system. A risky IAM permission + an exposed service + a reachable database isn’t three problems. It’s one attack path.
That framing is correct. The execution is where it falls apart.
Here’s what the typical stack looks like:
You ingest findings from CSPM, CNAPP, CWPP, SIEM, vulnerability scanners, and whatever else compliance made you buy. Then you normalize them into Splunk, Elastic, or a platform like Snowflake. You route “actionable” work into your ticketing system or a SOAR and you build dashboards so leadership can see the number go down.
You still drown. The dashboards look good. The alert queue does not.
Over time, your team develops a sophisticated, undocumented heuristic for which alerts to ignore. This is a deeply human, deeply cursed form of tuning. It lives in the heads of your senior analysts. When they leave, it leaves with them.
The deeper problem is that even when the platform is good, you don’t control alert fidelity. You’re buying someone else’s definition of “important.” You can tune thresholds, suppress findings, adjust severity weights—but you can’t shape the alert logic end-to-end.
And if you want automation? Many platforms will happily sell you the privilege of paying them more money so their product can fix the problems their product found.
That made sense five years ago, when “automation” meant writing brittle rules that broke on the next infrastructure change. It makes less sense now.
LLMs can do deep investigative work: retrieve context, check invariants, correlate signals across systems, and produce a structured recommendation with supporting evidence. The capability exists. The question is how to deploy it safely.
What Cerebro actually does
Cerebro is not a replacement for your scanners, your SIEM, or your cloud security platform. It’s the thing you wish existed between findings and action, between alerts, triage, and burnout.
We built the project in part because we needed to scan massive cloud environments (AWS, GCP, Azure, Kubernetes) efficiently. Cerebro uses goroutines and built-in concurrency to handle real-time tasks and scan in parallel. You can set up worker pools to scan thousands of cloud resources at once without needing to manage threads.
Cerebro also has first-class SDKs for all major cloud providers (AWS, GCP, Azure), making it the natural choice for a cloud security platform that needs to interact with multiple cloud APIs.
Plus, Cerebro compiles to static binaries with no runtime dependencies. That’s critical for deploying a security tool across different environments (CLI, API server, distributed workers) without dependency management headaches.
Rather than using traditional databases, Cerebro uses Snowflake (a cloud data warehouse) for storing and querying security data. This may be unconventional, but in our opinion it’s incredibly powerful. It allows SQL-based security queries across massive datasets, supports compliance reporting at scale, and provides built-in data lake capabilities. The architecture can handle both real-time security scanning and historical compliance analysis in the same platform.
Cerebro’s architecture at a high level
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Security Tools │───>│ Normalization │───>│ Resource Graph │
│ (CSPM, CNAPP, │ │ Layer │ │ (topology + │
│ vuln scanner) │ │ │ │ relationships)│
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Routing Layer │<───│ Triage Agent │<───│ Enrichment │
│ (Jira, SOAR, │ │ (decision + │ │ Agents │
│ Slack, PD) │ │ evidence) │ │ (tool calls) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
The pipeline
Cerebro starts by ingesting security findings from across your entire cloud infrastructure — AWS, GCP, Azure, Kubernetes — and normalizing them into Snowflake as a unified data substrate where everything becomes queryable via SQL. This normalized data feeds into a comprehensive resource graph that maps your entire environment: which services communicate with each other, who has access to what resources, what’s publicly exposed, and how an attacker could potentially move laterally through your systems.
From there, Cerebro’s AI agents go to work enriching each finding with critical context — it investigates by querying the SQL layer, retrieving related findings with semantic filters, looking up asset configurations, evaluating policies in real-time, and searching audit logs to understand what actually happened.
These enriched findings don’t stay isolated. They’re clustered together into complete incidents that tell a coherent story, with all the context assembled automatically instead of forcing your analysts to correlate data manually across six different cloud consoles.
The incident responder agent then makes intelligent triage decisions — should this be resolved immediately, does it need a ticket for follow-up, or can it be deferred — based on policy evaluation, compliance requirements, and attack path risk analysis, with human-in-the-loop thresholds for high-stakes actions.
Finally, everything routes back into your existing workflow systems with full traceability and auditability, using SQL as the universal query interface. The result: your human analysts spend their time on decisions that require judgment, experience, and institutional knowledge, not on the tedious work of assembling context, chasing down relationships, or figuring out which of a hundred alerts actually matter.
Let’s break the details of the pipeline down into steps.
Normalize the Finding
Different tools describe the same problem in different dialects. Wiz says one thing, Prisma says another, your homegrown scanner says something else entirely.
We convert everything into a shared structure:
finding:
entity: arn:aws:s3:::backup-prod-2024
entity_type: s3_bucket
issue_type: public_access
severity_reported: high
evidence:
- public_acl: true
- bucket_policy_allows_anonymous: true
source: wiz
environment: production
discovered_at: 2024-01-15T08:23:00Z
This isn’t glamorous, but it’s essential. You can’t reason about findings you can’t compare.
Build the Incident Context
This is where the graph matters.
Instead of treating an alert as a row in a table, we attach it to a local slice of the system:
- Upstream dependencies: What services write to this resource?
- Downstream dependencies: What consumes data from it?
- Ingress paths: Can external traffic reach it?
- Egress paths: Can it reach the internet?
- Identity relationships: Who has permissions? Through what roles?
- Blast radius: If this is compromised, what else is affected?
This context is cached and incrementally updated. We’re not rebuilding the graph on every finding.
Enrich via tool calls
Here’s where agents do actual work.
The enrichment agent has access to a set of tools:
| Tool | Purpose | Example Query |
|---------------------|--------------------------------------------|----------------------------------------|
| `cloud_inventory` | Tags, owner, environment, creation date | "What are the tags on this resource?" |
| `iam_analyzer` | Effective permissions, trust relationships | "Who can assume this role?" |
| `network_analyzer` | Reachability from internet, VPC topology | "Is this publicly accessible?" |
| `runtime_telemetry` | Is the workload actually running? | "When was this last active?" |
| `change_history` | Recent modifications | "What changed in the last 7 days?" |
| `auth_logs` | Access patterns, anomalies | "Any unusual access to this resource?" |
| `vuln_state` | CVE status across deployments | "Is this CVE present at runtime?" |
The agent decides which tools to call based on the finding type and current evidence gaps. It’s not running every tool on every finding—that would be slow and expensive.
Deduplicate into Incidents
Many alerts are the same incident wearing different hats.
Example: A public S3 bucket might generate:
- A CSPM finding for the bucket policy
- A separate finding for the ACL
- A vulnerability scanner finding for “sensitive data exposure”
- A compliance finding for “S3 bucket not encrypted”
Four alerts. One problem.
The deduplication agent clusters related findings into a single incident:
incident:
id: INC-2024-0142
primary_issue: public_s3_bucket
supporting_findings:
- finding_id: wiz-12345 (public_acl)
- finding_id: prisma-67890 (bucket_policy)
- finding_id: compliance-11111 (encryption)
amplifiers:
- contains_database_backups: true
- no_compensating_network_control: true
confidence: 0.92
evidence_summary: "..."Make the decision
The triage agent produces a constrained output:
triage:
decision: resolve_now | ticket | defer
priority: P1 | P2 | P3 | P4
rationale: "..."
evidence:
- exposure_confirmed: true (source: network_analyzer)
- production_tagged: true (source: cloud_inventory)
- owner_identified: platform-team (source: inventory)
- compensating_controls: none
recommended_action: "..."
routing: jira | pagerduty | slackThe decision categories are intentionally simple:
- Resolve now: High confidence + confirmed exposure + high blast radius + no compensating control. This is the “stop what you’re doing” bucket.
- Ticket: Real issue, bounded risk, work it in normal cycles. Most findings land here.
- Defer / accept / monitor: Low confidence, low exposure, or compensating controls exist. Log it, don’t staff it.
Alright, we’ve broken down how the system works in theory, but let’s use a practical example to illustrate how it might work in practice.
Mini-Case: The 3 AM S3 Bucket
Let me walk through how this actually works.
The alert: Your CSPM fires at 3 AM. Public S3 bucket. Severity: Critical. Contents: appears to be database backups based on filename patterns. Owner: unknown.
The traditional flow
- On-call analyst wakes up, opens the console
- Checks the bucket—yes, it has a public ACL
- Opens IAM, looks at policies—yes, the bucket policy allows anonymous reads
- Searches Slack for who owns it—no clear answer
- Checks if it’s production or dev—unclear from tags
- Looks at CloudTrail—no public access in the logs, but they only have 7 days retained
- Escalates to the security lead
- Security lead spends another hour digging
- Turns out: the bucket is in a legacy dev account, has been public since 2021, contains synthetic test data, and is only accessible via a VPC endpoint despite the public ACL
Total time: 3 hours
Decision: Defer, create a cleanup ticket
The new Cerebro flow
1. Normalize: resource=s3://backup-dev-legacy, issue=public_access, source=cspm, severity=critical
2. Graph context:
- Bucket is in aws-account: dev-legacy-2021
- Used by a decommissioned ETL pipeline
- No production dependencies
- No cross-account access configured
3. Enrichment (parallel tool calls):
- `cloud_inventory`: env=dev, owner=data-platform-team, created=2021-03-15
- `iam_analyzer`: No cross-account trust, no external principals
- `network_analyzer`: VPC endpoint restricts actual access; no public route despite ACL
- `auth_logs`: Zero public GetObject requests in 90 days
- `runtime_telemetry`: No active consumers
4. Decision:
triage:
decision: defer
priority: P4
rationale: |
Non-production resource in legacy dev account.
Public ACL is misleading—actual access is restricted by VPC endpoint.
No evidence of external access in audit logs.
Owner identified; recommend cleanup ticket but no urgent action required.
evidence:
- environment: dev (source: inventory)
- public_access_actual: false (source: network_analyzer)
- external_access_90d: 0 (source: auth_logs)
recommended_action: Create P4 ticket to remove public ACL and add bucket policy deny.
routing: jiraTotal time: 90 seconds
Decision: Defer, P4 ticket
The analyst doesn’t wake up. The ticket is created with full context. The security lead reviews it in the morning, confirms the reasoning, and moves on.
The path to our existing implementation was far from a straight line.
What We Got Wrong (At First)
Mistake 1: triage without evidence
Our first version fed findings to the model and said “triage this.”
The model complied. Confidently. With outputs that sounded plausible and were often wrong.
LLMs are very good at producing reasonable-sounding answers. They’re not good at knowing when they don’t have enough information.
The fix: The agent must retrieve evidence before it decides. We added hard requirements:
- No “resolve now” without confirmed exposure
- No “defer” without confirmed mitigation
- No “ticket” without identified owner
If the agent can’t gather the required evidence, it outputs `needs_human_review` instead of guessing.
Mistake 2: freeform reasoning
Early versions let the agent respond in narrative paragraphs.
Engineers don’t want paragraphs. They want:
- A decision
- The reasons (brief)
- The evidence (linked)
- The next action
We made the output schema-driven. Strict categories, required fields, explicit confidence scores, evidence citations back to source tools.
Quality improved immediately. More importantly, the outputs became testable.
Mistake 3: fully trusting the graph
Your infrastructure graph is never complete. Resources get created outside Terraform. Tags are missing. Ownership metadata is stale.
We added fallback heuristics: if the graph doesn’t know something, the agent can infer from naming conventions, account structure, or ask for human input. The system degrades gracefully instead of failing silently.
After learning from these mistakes, we built a series of safeguards.
Guardrails: how avoid creating a new outage vector
There’s a version of this where the agent also auto-remediates.
We’re not opposed to that. We’re just not interested in skipping the part where you earn trust.
Cerebro is designed to be extensible into response automation, but the baseline focuses on triage, because that’s where the ROI is highest and the risk is lowest.
Human-in-the-Loop Thresholds
Not all decisions should be automated to the same degree.
Routing policies:
routing_rules:
- condition: confidence > 0.9 AND blast_radius < medium AND decision = ticket
action: auto_create_ticket
- condition: confidence > 0.9 AND exposure_confirmed AND decision = resolve_now
action: page_on_call
- condition: touches_iam OR touches_network_boundary
action: require_human_approvalDeterministic validation
Before any downstream action triggers, we run deterministic checks:
- Is the resource actually reachable? (Don’t trust the graph alone)
- Is it tagged production?
- Does the identified owner still exist in the org?
- Are there known exceptions or accepted risks?
- Do we have enough evidence to justify the priority?
Agents propose. Rules validate. Humans approve high-risk actions.
Traceability
Every triage output includes:
- Which tools were called
- What evidence was retrieved
- What the confidence score was
- Why the decision was made
This is non-negotiable for auditability. It’s also essential for debugging when the system makes a call you disagree with.
Evals: measuring triage quality
If you can’t measure it, you’re building a very expensive autocomplete.
We track:
| Metric | What It Measures |
|-----------------------|-------------------------------------------------|
| False positive rate | % of "resolve now" that weren't actually urgent |
| False negative rate | % of "defer" that became incidents |
| Mean time to decision | How long from alert to triage output |
| Evidence completeness | % of decisions with all required evidence |
| Analyst override rate | How often humans change the decision |
| Time saved per week | Estimated analyst hours recovered |The goal isn’t 100% accuracy. It’s consistent, evidence-backed decisions that free up analyst time for the cases that actually require human judgment.
The team has been excited with the results we’ve seen so far, and wanted to share with the security community.
Why Open Source
Security teams need extensibility
Every environment is different. Different tools, different conventions, different risk thresholds. A closed system forces you to adapt your process to the product. An open system adapts to you.
The best ideas won’t all come from us
The next step is response automation, and we want lots of teams building:
- Connectors for their specific tooling
- Auto-remediation workflows for safe changes
- Runbook generation for complex incidents
- Integration with their ticketing and on-call systems
Open source is also the cleanest way to prove we’re serious about agentic security work without it becoming a sales pitch. We’re not selling a product. We’re showing our work.
Conclusion
The cloud attack surface is sprawling, interconnected, and moving faster than any team can manually review. Alert floods aren’t solved by “trying harder.” They’re not solved by hiring more analysts. They’re not solved by better dashboards. They’re solved by changing what humans are responsible for.
Keep your detection sources. Model your environment as a graph. Use agents to do the tedious, multi-tool investigation work that humans should never have been doing manually in the first place. The future of security triage isn’t 5,000 alerts. It’s fewer decisions, made faster, with better evidence.
If this is useful to you, we’d love feedback once the repo is public: connectors, clustering algorithms, eval datasets, routing policies, guardrail patterns — all of it.
Because the bottleneck was never more detections.
It was triage.