Prompt Injection With Image Scaling Attacks Threatens AI System

As image generation and processing using AI tools become more common, ensuring thorough security throughout the process is even more necessary. Researchers have shared insights about a new attack strategy that exploits AI for data exfiltration via images. The attack couples the known threat of image scaling attacks against AI with prompt injection, demonstrating how malicious actions can be carried out sneakily.

Researchers Couple Prompt Injection Attacks With Image Scaling

In a recent post, researchers from the cybersecurity firm Trail of Bits shared details about how prompt injection attacks can exploit image scaling in AI tools to perform malicious actions. These actions can range from simple activities like opening an app to data exfiltration – all without alerting the victims.

Image scaling attacks, first demonstrated by researchers from the Technische Universität Braunschweig, Germany, in 2020, involve exploiting the image scaling process of AI systems. When processing images, AI systems scale down the input images for faster and better processing before forwarding them to the model. A malicious actor can exploit this reduction in image size to manipulate how the model processes the image. In the case of the Trail of Bits researchers, they exploited this image scaling for prompt injection attacks.

Source: Trail of Bits

As demonstrated, the researchers injected a malicious prompt into an image, ensuring the prompt remains invisible when the image is viewed at full scale. However, upon rescaling by an AI system, the change in image resolution makes the prompt visible to the system. Once forwarded to the AI model, the prompt tricks the model into considering it as part of the instructions. As a result, the model executes the respective malicious action mentioned in the prompt without the user’s knowledge.

In their experiment, the researchers demonstrated this attack strategy against the Gemini CLI with the default configuration for the Zapier MCP server. They uploaded an image hiding a malicious prompt to exfiltrate user data from Google Calendar to a given email address.

The researchers have shared the details of this attack strategy in their post.

Most AI Systems Are Vulnerable To This Attack

According to the researchers, this attack, with minor adjustments depending on the target AI model, works against most systems, such as:

For further testing, the researchers have also publicly released an open-source tool called “Anamorpher” on GitHub. This tool – backed by a Python API – lets users visualize the attacks against multimodal AI systems. Currently in beta, it creates images crafted for multimodal prompt injections when downscaled.

Recommended Mitigations

According to the researchers, limiting downscaling algorithms will not help prevent these attacks, given the widespread attack vector. Instead, the researchers recommend limiting upload dimensions and avoiding image downscaling. Besides, ensuring an exact preview of the image that the model sees would also help detect any prompt injections that might go unnoticed when uploading the images.

In addition, the researchers urge the implementation of robust defense strategies to prevent multimodal prompt injection attacks, such as deploying mandatory user confirmation before executing any instructions provided as text within images.

Let us know your thoughts in the comments.

Get real time update about this post category directly on your device, subscribe now.

Continue Reading