
"Accordingly, OpenAI mitigated the prompt-injection technique ShadowLeak fell to-but only after Radware privately alerted the LLM maker to it. A proof-of-concept attack that Radware published embedded a prompt injection into an email sent to a Gmail account that Deep Research had been given access to. The injection included instructions to scan received emails related to a company's human resources department for the names and addresses of employees. Deep Research dutifully followed those instructions."
"By now, ChatGPT and most other LLMs have mitigated such attacks, not by squashing prompt injections, but rather by blocking the channels the prompt injections use to exfiltrate confidential information. Specifically, these mitigations work by requiring explicit user consent before an AI assistant can click links or use markdown links -which are the normal ways to smuggle information off of a user environment and into the hands of the attacker."
"At first, Deep Research also refused. But when the researchers invoked browser.open-a tool Deep Research offers for autonomous Web surfing-they cleared the hurdle. Specifically, the injection directed the agent to open the link https://compliance.hr-service.net/public-employee-lookup/ and append parameters to it. The injection defined the parameters as an employee's name and address. When Deep Research complied, it opened the link and, in the process, exfiltrated the information to the event log of the website."
Prompt injections have proved difficult to fully prevent, analogous to memory-corruption and SQL injection vulnerabilities. LLM providers have relied on mitigations introduced on a case-by-case, reactive basis after exploits are discovered. OpenAI mitigated the ShadowLeak prompt-injection technique only after Radware privately alerted the company. A Radware proof-of-concept embedded a prompt injection in an email to a Gmail account, instructing scanning of HR-related emails for employee names and addresses, which Deep Research executed. Current mitigations focus on blocking exfiltration channels by requiring explicit user consent before assistants click or follow links, but autonomous tools can still enable leaks.
Read at Ars Technica
Unable to calculate read time
Collection
[
|
...
]