Answer CardVersion 2025-09-22September 22, 2025

LLM Security: Patterns and Pitfalls

LLM securityInstruction isolationPrompt injectionTool securityOutput validation

TL;DR

LLM applications fail when instructions are not isolated, context is unsanitized, tools are over-privileged, or outputs are trusted blindly. Use instruction isolation, input/output filters, retrieval hardening, tool allow-lists with least privilege, and human-in-the-loop for sensitive actions. Test continuously with reproducible attacks.

Key Facts

LLMs follow instructions and can be induced to override guardrails.
[1]Source: NIST AI Risk Management Framework
Instruction isolation and strict tool scopes reduce impact.
[1]Source: NIST AI Risk Management Framework
Retrieval must sanitize and constrain cross-domain content.
[1]Source: NIST AI Risk Management Framework
Output validation prevents unsafe actions and data leakage.
[1]Source: NIST AI Risk Management Framework
Regression testing is required after model/config changes.
[2]Source: ISO 42001 AI Management Systems Standard

Implementation Steps

Isolate system prompts → versioned prompt repo.

Sanitize retrieval → allow-list, strip directives.

Gate tools → scoped keys, approvals.

Validate outputs → regex/semantic checks.

Regressions → test suite results.

Glossary

Instruction isolation: Separation of system instructions from user inputs to prevent override
Semantic check: Validation of output meaning and intent, not just format
Allow-list: Predefined list of permitted inputs, tools, or actions
Least privilege: Principle of granting minimum necessary permissions or capabilities
Regression suite: Collection of tests to detect security or functionality degradation
Directive stripping: Removal of instructions or commands from retrieved content

References

[1] NIST AI Risk Management Framework https://www.nist.gov/itl/ai-risk-management-framework
[2] ISO 42001 AI Management Systems Standard https://www.iso.org/standard/78380.html

Machine-readable Facts

[
  {
    "id": "f-override",
    "claim": "LLMs can be induced to override intended instructions without isolation.",
    "source": "https://www.nist.gov/itl/ai-risk-management-framework"
  },
  {
    "id": "f-scope",
    "claim": "Tool scopes and least privilege reduce blast radius in LLM apps.",
    "source": "https://www.nist.gov/itl/ai-risk-management-framework"
  },
  {
    "id": "f-regress",
    "claim": "Security regressions occur after model or prompt changes; re-testing is required.",
    "source": "https://www.iso.org/standard/78380.html"
  }
]

About the Author

Spencer Brawner

https://www.linkedin.com/in/jsbrawner