The standard prompt injection defences I review — input validation, output filtering, jailbreak detection — all look at the user's message. RAG attacks walk right past them. The attacker never sends the injection through the user input channel at all. They upload a PDF to the shared knowledge base. They submit a support ticket whose content gets indexed. They edit a public wiki page that the enterprise RAG system crawls weekly. Three weeks later, when a legitimate user asks a…
No comments:
Post a Comment