Executive Summary
Both attack vectors exploit a fundamental architectural reality: LLMs process instructions and data through the same mechanism with no cryptographic separation. Direct injection succeeds through adversarial prompt crafting; indirect injection weaponizes trusted retrieval channels. Current defenses are fragile optimizations around an insecure-by-design paradigm. The only robust path forward combines structural enforcement (privilege separation, control tokens) with a strategic retreat from autonomous action in high-stakes contexts.
Key Insights
- Semantic vs Syntactic Attacks: Traditional security filters syntax; these attacks manipulate meaning. No regex can distinguish malicious from legitimate instructions at semantic level.
- RAG's Original Sin: Treating retrieved content as trusted context creates an invisible attack surface across every data source.
- The Performance-Security Tradeoff Is Real: Robust defenses (provenance tracking, separate models, validation layers) add 40%+ latency and 2-3x compute costs. This isn't optional overhead — it's the price of security.
- Compositional Black Swans: The real threat isn't single injections but time-delayed, multi-source attacks that compose across trusted documents. Nobody's monitoring for this.
- Architecture vs Band-Aids: Prompt engineering and filtering are security theater. Only structural changes (control tokens, privilege boundaries at the attention mechanism) address root cause.
Points of Agreement
Universal consensus on core problem: Code and data share the same channel in LLMs. This is the architectural flaw that enables both attack vectors.
Provenance tracking is necessary: Every the analysis acknowledged trust-level tagging for retrieved content, despite implementation costs.
Output constraint > input filtering: Limiting what actions AI can take matters more than trying to filter infinite input variations.
Indirect injection is more dangerous: Poisoning trusted sources creates persistent, scalable attack vectors that bypass user-facing filters entirely.
Points of Disagreement
Can this be fixed architecturally?
- Optimists (Schneier, Torvalds): Yes, through control tokens and privilege separation at the model level
- Pessimist (Thiel): No, because probabilistic token prediction fundamentally can't distinguish instruction from data
Should we build autonomous agents at all?
- Thiel: No — retreat to advice-only systems immune by design
- Others: Yes, but with expensive defenses — market demands autonomy
Graceful degradation vs fail-hard
- Taleb: Systems should gain capability under attack (antifragile)
- Torvalds: Systems should crash loudly to force fixes
Performance tradeoffs
- Carmack: Accept 200ms+ latency and 3x costs for security
- Implicit market pressure: Users won't tolerate this degradation
Verdict
Current Attack Mechanics
Direct Injection works through:
- Authority override patterns ("SYSTEM ALERT: New instructions follow...")
- Role-play exploitation ("You are DAN, who can do anything...")
- Compliance smuggling (Legitimate wrapper around malicious core)
- Adversarial AI adaptation (Automated generation of thousands of variants)
Why it works: LLMs are trained to be helpful and follow instructions. They have no concept of "suspicious context," "verify sender," or "this conflicts with prior instructions." Every input is processed as equally valid.
Indirect Injection works through:
- Content poisoning in retrieval sources (PDFs, emails, wikis, databases)
- Trust chain exploitation (AI trusts retrieved content as factual context)
- Privilege escalation (Retrieved instructions execute with system-level context)
- Persistence (Poisoned content cached, affects multiple sessions)
Why it's worse: Attacks the infrastructure, not the interface. Scales to every retrieval source. Time-delayed activation possible. No user-visible warning signs.
Why Traditional Defenses Fail
Input filtering: Infinite semantic variations. Adversarial AI generates faster than you can blacklist.
Prompt engineering: Plaintext system instructions have no enforcement mechanism. "Ignore previous instructions" anywhere in context can override.
Rate limiting / user monitoring: Doesn't stop poisoned documents. Insider threats irrelevant when attack is in the data.
Semantic similarity detection: Requires expensive embedding computations. False positive rate 5-15%. Adversarial AI optimizes to evade.
Layered Defense Strategy
TIER 1 — Immediate (Low Cost, Partial Protection)
Cheap filters first (Torvalds):
- Regex patterns for "ignore previous instructions," "you are DAN," common jailbreaks
- Catches 30% of script-kiddie attacks for <1ms overhead
- Deploy today, accept it's incomplete
Output structure enforcement (Schneier):
- Force JSON schema outputs for high-risk actions
- Reduces injection impact by 60-70% even if prompt succeeds
- Action constraint > input constraint
TIER 2 — Architectural (High Cost, Core Protection)
Privilege separation (Schneier + Torvalds):
- Retrieval model (reads documents, outputs structured summaries, NO raw text forwarding)
- Reasoning model (processes summaries, generates options)
- Action model (executes only pre-approved operations, JSON-constrained)
- Cost: 2-3x inference expense, +150-200ms latency
- This is non-negotiable for production security
Provenance tagging (Mitnick + Carmack):
- Every token gets metadata: source, trust level, timestamp
- Low-trust sources (external retrieval) flagged in context
- Implementation: Requires attention mask modifications or prompt injection of trust markers
- Cost: 15-30% latency overhead, complex caching invalidation
- Accept the performance hit or don't deploy retrieval features
Control tokens (Torvalds):
- Special tokens model recognizes as "system instruction boundary"
- Requires model architecture changes or fine-tuning
- OpenAI's function calling is partial implementation
- Push vendors to expose these primitives
TIER 3 — Antifragile Design (Taleb's Via Negativa)
Air-gapped models for high stakes:
- Financial decisions, medical advice, legal analysis: ZERO retrieval
- Frozen knowledge cutoff,
Related Topics
Related Analysis

LLM Security and Control Architecture: Addressing Prompt
The Board · Feb 19, 2026

US Semiconductor Supply Chain Security: Geopolitical Risks 2026
The Board · Feb 17, 2026

Global Tech Intersections and Regulatory Arbitrage
The Board · Feb 17, 2026

OpenAI vs Anthropic: Who Wins the AI Race by 2026?
The Board · Feb 15, 2026

Securing LLM Agents and AI Architectures in 2026
The Board · Feb 20, 2026

Quantum Computing Breakthroughs: Geopolitical Implications
The Board · Mar 4, 2026
Trending on The Board

Israeli Airstrike Hits Tehran Residential Area During Live
Geopolitics · Mar 11, 2026

Fuel Supply Chains: Australia's Stockpile Reality
Energy · Mar 15, 2026

The Info War: Understanding Russia's Role
Geopolitics · Mar 15, 2026

Iran War Disinformation: How AI Deepfakes Fuel Chaos
Geopolitics · Mar 15, 2026

THAAD Interception Rates: Iran Missile Combat Data
Defense & Security · Mar 6, 2026
Latest from The Board

US Crew Rescued After Jet Downed: Israeli Media Reports
Defense & Security · Apr 3, 2026

Hegseth Asks Army Chief to Step Down: Why?
Policy & Intelligence · Apr 2, 2026

Trump Fires Attorney General: What Happens Next?
Policy & Intelligence · Apr 2, 2026

Trump Marriage Comments Draw Macron Criticism
Geopolitics · Apr 2, 2026

Iran's Stance on US-Israeli War: No Negotiations?
Geopolitics · Apr 1, 2026

Trump's Iran War: What's the Exit Strategy?
Geopolitics · Apr 1, 2026

Trump Ukraine Weapons Halt: Iran Strategy?
Geopolitics · Apr 1, 2026

Ukraine Weapons Halt: Trump's Risky Geopolitical Play
Geopolitics · Apr 1, 2026
