The enterprise AI adoption curve has compressed what should have been a decade of security maturation into eighteen months. Organisations are deploying large language models as internal knowledge assistants, customer-facing chatbots, and autonomous workflow agents — often with minimal security review and almost no understanding of the novel threat categories these systems introduce. OffSecAI's prompt injection framework was developed specifically to address this gap.

What Prompt Injection Actually Is

Prompt injection is the LLM equivalent of SQL injection: manipulation of an AI system's instructions through untrusted input. In a direct injection, an attacker supplies a prompt that overrides the system's original instructions. In an indirect injection — currently the more dangerous variant — malicious instructions are embedded in content the LLM is asked to process: a webpage, a document, an email.

In a recent assessment of a UK-based professional services firm, we embedded a prompt injection payload inside a SharePoint document that the company's AI assistant was configured to summarise. The payload instructed the model to exfiltrate its conversation history to an external URL. Within 90 seconds of a staff member triggering the summarisation workflow, internal HR policy documents were silently transmitted.

The OffSecAI Testing Framework

Our framework tests LLM deployments across five attack categories: direct prompt override, indirect injection via document and web content, context window poisoning, tool-call hijacking in agentic systems, and data exfiltration via model outputs. We maintain a library of over 2,400 test payloads, updated monthly as new techniques emerge.

Tool-call hijacking deserves particular attention for organisations deploying agentic LLMs. When an agent's tool-use can be triggered by injected content, the consequences extend far beyond data leakage. In one tested environment, we caused an AI coding assistant to commit malicious code to a staging repository by injecting instructions into a GitHub issue comment the assistant was processing.

Defensive Architecture Principles

  • Treat the model as an untrusted component — validate all inputs and outputs
  • Implement strict tool-call allowlisting for agentic systems
  • Separate the model's reasoning context from sensitive data stores
  • Log every prompt and completion for anomalous pattern detection
  • Never grant LLM-integrated services write access to production systems without human-in-the-loop approval