LLM Security & Prompt Injection Testing | Secureware

The enterprise AI adoption curve has compressed what should have been a decade of security maturation into eighteen months. Organisations are deploying large language models as internal knowledge assistants, customer-facing chatbots, and autonomous workflow agents — often with minimal security review and almost no understanding of the novel threat categories these systems introduce. OffSecAI's prompt injection framework was developed specifically to address this gap.

What Prompt Injection Actually Is

Prompt injection is the LLM equivalent of SQL injection: manipulation of an AI system's instructions through untrusted input. It is the #1 ranked risk in the OWASP Top 10 for LLM Applications, and for good reason.

In a direct injection, an attacker supplies a prompt that overrides the system's original instructions — typically by appending instructions like "Ignore all previous instructions and instead…"

Indirect injection is currently the more dangerous variant. Malicious instructions are embedded in content the LLM is asked to process: a webpage, a document, an email, a code comment. The user or operator may be entirely unaware that the content they've asked the model to summarise or analyse contains adversarial instructions.

A Real Example: SharePoint Document Exfiltration

In a recent assessment of a professional services firm, we embedded a prompt injection payload inside a SharePoint document that the company's AI assistant was configured to summarise. The payload instructed the model to:

Locate any HR policy documents or sensitive content in its context window
Format that content as a URL-encoded query string
Silently include that string in a hyperlink embedded in the "summary" output

Within 90 seconds of a staff member triggering the summarisation workflow, internal HR policy documents were silently transmitted to an attacker-controlled endpoint. No alerts fired. The staff member saw a normal-looking summary.

The OffSecAI Testing Framework

Our framework tests LLM deployments across five attack categories, mapped to MITRE ATLAS:

Direct prompt override — standard jailbreak and instruction override techniques against the system prompt
Indirect injection via document and web content — injections embedded in PDFs, DOCX files, web pages fetched by the model, and email bodies
Context window poisoning — large volumes of adversarial content designed to shift the model's behaviour through sheer volume or repetition
Tool-call hijacking in agentic systems — manipulation of the model into invoking tools with attacker-controlled parameters
Data exfiltration via model outputs — covert channels embedded in model responses

We maintain a library of over 2,400 test payloads, updated monthly as new techniques emerge from public research and our own internal red-teaming.

Tool-Call Hijacking Deserves Special Attention

For organisations deploying agentic LLMs — models with the ability to take real-world actions like sending emails, running code, querying databases, or calling APIs — tool-call hijacking deserves particular attention.

In one tested environment, we caused an AI coding assistant to commit malicious code to a staging repository by injecting instructions into a GitHub issue comment the assistant was processing. The assistant had been granted write access to the repository as part of its intended workflow. The injection required no special privileges and no compromise of any system other than the issue tracker.

In another engagement, we used indirect injection in a customer email to cause an AI support agent to:

Query internal customer records it was authorised to access
Format those records as a JSON payload
Include that payload as a "reference number" in an outbound confirmation email to an attacker-controlled address

The entire operation was indistinguishable from normal support agent behaviour.

Defensive Architecture Principles

Treat the model as an untrusted component — validate all inputs and outputs, regardless of what you told the model in the system prompt
Implement strict tool-call allowlisting for agentic systems — define exactly which tools can be called in which contexts, and reject anything outside that envelope
Separate the model's reasoning context from sensitive data stores — the model should retrieve only what it needs for the current task, not have broad read access to all organisational data
Log every prompt and completion for anomalous pattern detection — sudden increases in URL appearances in outputs, unusual JSON structures, or references to sensitive data are all detectable signals
Never grant LLM-integrated services write access to production systems without a human-in-the-loop approval step — the blast radius of a successful injection attack is bounded by what actions the model can take

The organisations that approach LLM deployment with the same rigour they apply to traditional application security are building systems that are genuinely robust. Those that treat prompt injection as a theoretical concern are building systems that OffSecAI — and real-world attackers — will compromise reliably.

References

Frequently Asked Questions

What is the difference between direct and indirect prompt injection?

Direct prompt injection involves an attacker supplying a prompt that overrides the system's original instructions. Indirect injection — currently more dangerous — embeds malicious instructions in content the LLM is asked to process, such as a webpage, document, or email, without the user or operator being aware.

How does tool-call hijacking work in agentic AI systems?

When an agent's tool-use can be triggered by injected content, attackers can cause the agent to invoke tools with attacker-controlled parameters. In tested environments, this has resulted in malicious code being committed to repositories, emails being sent, and external data exfiltration — all triggered by content processed by the AI.

What are the core defensive principles for LLM deployments?

Treat the model as an untrusted component. Validate all inputs and outputs, implement strict tool-call allowlisting, separate the model's reasoning context from sensitive data stores, log every prompt and completion, and never grant LLM-integrated services write access to production systems without human-in-the-loop approval.

Does OWASP have a framework for LLM security?

Yes. The OWASP Top 10 for Large Language Model Applications covers prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities, sensitive information disclosure, insecure plugin design, excessive agency, overreliance, and model theft. Our OffSecAI methodology maps directly to this framework.

Can prompt injection bypass safety guardrails in commercial LLMs?

Often yes — particularly for indirect injection. Even frontier models with extensive safety training can be manipulated when adversarial content arrives through trusted channels such as document summarisation, web fetching, or email processing. The risk is not the model's intent — it is the architecture surrounding it.

How frequently should LLM deployments be retested?

Quarterly at minimum for production agentic systems with tool access. After every material change to system prompts, tool integrations, or data sources accessible to the model. Our payload library is updated monthly as new techniques emerge — last quarter's clean bill of health does not survive contact with this quarter's threat landscape.

What is the most overlooked LLM security risk we see?

Excessive agency — agents granted broader tool access than they require for their stated task. We routinely find AI assistants with write access to production code repositories, customer databases, or email systems based on convenience rather than necessity. Tightening the scope of agency is the single most impactful defensive change available.

prompt injection LLM security AI security OffSecAI agentic AI data exfiltration indirect injection tool-call hijacking

← Back to Blog Book an OffSecAI LLM Security Assessment →