Exploring LLM threats and defenses in robotics

November 26, 2024

“Invincibility lies in the defense; the possibility of victory in the attack.”
— Sun Tzu

Application of AI

In recent years, humanity drastically increased its use of AI technologies. AI has became a buzzword for nearly every business, whether the application of machine learning is appropriate or not. For example, we might receive a completely useless transcript of a meeting when model attempts to transcribe English speech, even though participants are speaking in Polish. Similarly, AI-enabled washing machines that optimize water, detergent, and rinsing time might not justify their cost for many customers. Another example is the AI pin, a product marketed as a smartphone alternative, which has struggled to gain widespread adoption.

While these overhyped product may waste computational power and money or offer questionable functionality, integrating advanced machine learning technologies raises more serious concerns. These includes issues related to security, privacy, and potential for malicious use.

Large language models

Large language models (LLMs) are a type of foundation model trained on massive amounts of data. Foundation models as a step beyond task-specific models. These models are trained on a broad set of unlabeled data and can be adapted to various tasks, with minimal fine-tuning. LLMs, in particular, are trained on vast text corpora sourced directly from the Internet. While this practice equips them with extensive knowledge, it also introduces risks, as these corpora often contain harmful content such as hate speech, malware, and misinformation.

LLMs, are well-known for their role in generative AI, where models produce content in response to human prompts. This content can take various forms, including text, images or audiovisual media. Examples of prominent LLMs include ChatGPT (from OpenAI), Gemini (Google), Llama (Meta), and Copilot (Microsoft). Beyond generative tasks, LLMs are applied in sentiment analysis, DNA research, customer service, chatbots, and online search. However, I want to focus on their application in robotics.

Control of robotic systems

LLMs have revolutionized robotics by enabling contextual reasoning and intuitive human-robot interaction. Researches have integrated LLMs at lower levels of control, such as designing reward function in reinforcement learning or generating robot-specific software. More recently, there has been considerable interest in implementing LLMs as high-level planners.

Planning a natural language mission
Source: Optimal Scene Graph Planning with Large Language Model Guidance

LLMs have been successfully applied in autonomous vehicles, mobile manipulation, service robotics and navigation. These models enhanced robot interaction in various domains. Below, I will describe some of these applications in more detail.

Aspect	Manipulation	Locomotion	Self-Driving Vehicles
Primary Interaction	Objects in the environment	Moving through environments	Navigating traffic and roads
Core Technology	Robotic arms, grippers, vision	Legs, wheels, or drones	Sensors (LiDAR, radar), AI
Key Goal	Precise control of object handling	Efficient and stable movement	Safe and autonomous navigation

LLMs have not interacted with their environment and observed the outcome of their responses, and thus are not grounded in the world. SayCan grounds LLMs via value functions of pretrained skills, allowing them to execute real-world, abstract, long-horizon commands on robots.
Source: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Model alignment and regulating

LLMs bring numerous benefits to the domains mentioned above. Aligning LLM-powered chatbots is generally achieved through fine-tuning algorithms that incorporate varying levels of human feedback. Since the advent of LLMs, people have been tailoring laws to their capabilities. This technology poses challenges that require attention in areas such as direct regulation, data protection, content moderation and policymaking.

Attempts to regulate new generative AI models have sparked debate and critiqued within the industry. LLMs, along with more abstract Large Generative AI Models (LGAIMs), differ significantly from older, more traditional AI models. As a result, existing AI regulations are far from being fully adapted to the new reality.

Direct regulation

Direct regulation involves the creation of specific legal frameworks and enforcement mechanisms to govern the development deployment, and usage of LLMs.

AI Act

The European AI Act the first-ever legal framework on AI. It provides rules, requirements and obligations for AI developers and deployers regarding AI usage. The Act strives to keep pace with the rapid evolution of AI technology. However, some lawyers and researchers argue that these rules fail to adjust to the new challenges posed by large AI models.

Definitions. Some definitions, like General-Purpose AI System (GPAIS), are considered overly broad. GPAIS, in the author’s view, should demonstrate significant generality in ability, task, or outputs beyond simply being integrated into various use cases (a characteristic also true for extremely simple algorithms).
Risk management. Providers of LLMs such as ChatGPT must analyze the risks for every potential application in every high-risk scenario involving health, safety and fundamental rights. Given the versatility of such systems, it is nearly impossible to comply fully with such obligations.
Negative impact on competition. The AI Act may inadvertently have anticompetitive effects For instance, open-source developers often explore LLMs for philanthropic or research reasons rather than for commercial ones. Although the AI Act excludes scientific research, any public release is unlikely to qualify as “solely for research”, especially when commercial partnership enter to limit liability and provide necessary fine-tuning. Consequently, all developers placing LLMs on the market will face the same high-risk obligations. This means only large players such as Google, Meta, Microsoft/Open AI may be able to bear the costs of compliance.

Data protection

Data protection focuses on safeguarding personal and sensitive data from unauthorized access or misuse. LLMs are often trained on massive datasets scraped from the Internet, which may include personal or copyrighted material. Regulations like the GDPR (General Data Protection Regulation) mandate mechanisms to anonymize or exclude such data.

However, recent studies shown that LGAIMs are vulnerable to inversion attacks, enabling data used for training to be reproduced from the model. While this is less problematic for publicly accessible copyrighted content, because it generally permitted to use for machine learning, the use of personal data for training requires a valid legal basis. Even model itself could be considered as personal data due to inversion attack risks.

Content moderation

Generative models, as virtually any novel technology, may be used for better (e.g. crafting personalized wedding cards) or worse purposes (generating deepfake videos to spread misinformation). Content moderation ensures that the outputs generated by models are appropriate, safe and complaint with laws and ethical standards.

For instance, developers of ChatGPT anticipated potential misuse with controversial help from Kenyan contractors, to detect and block harmful content. However, malicious actors can still find ways to prompt these models to generate fake or harmful content. The unprovoked russian attack on Ukraine, the COVID-19 pandemic, and climate change have already fueled hate crimes and disinformation. In this toxic climate, LLMs could become powerful tools for automated large-scale production of highly sophisticated, seemingly fact-based but actually utterly nonsensical fake news and harmful speech campaigns.

Policymaking

Policymaking aim to guide the development, deployment, and governance of generative AI models while balancing innovation with public safety and ethical considerations. The preceding examples demonstrate the necessity of LLM regulation. Regulatory compliance must be feasible for both large and small developers to avoid a winner-takes-all scenario.

This is crucial not only for innovation and or consumer welfare, but also for environmental sustainability. While the carbon footprint of IT and AI is significant and steadily rising, training large models may ultimately result in fewer greenhouse emissions if they can be adapted to multiple uses.

Jailbreaking LLM

Widely-used chatbots also vulnerable to malicious attacks known as jailbreaks. The recent discovery of such attacks has prompted a rapidly growing body of research focused on stress-testing LLM capabilities through increasingly more sophisticated methods. Researchers and practitioners explore vulnerabilities to assess model alignment. Jailbreaking involves designing algorithms or crafting prompts that bypass model safety features. Malicious users exploit this to generate harmful content (e.g., instruction outlining how to build a bomb).

Adversarial attacks

Jailbreaking attacks can be categorized into few different classes.

The first class of prompt-level jailbreaks rely on social-engineering-based, semantically meaningful prompts to elicit objectionable content. While effective, this approach requires creativity, manual dataset curation, and human feedback, making it labor-intensive and costly.

The second class of token-level jailbreaks optimize the set of tokens passes to a targeted LLM, exploiting weaknesses at a granular level. Though highly effective, this method often requires hundreds of thousands of queries and produces result that are unintelligible to humans.

To address the limitations of prompt-level jailbreaks (labor-intensive) and token-level jailbreaks (inefficient and opaque), researchers have developed a novel approach Prompt Automatic Iterative Refinement (PAIR). PAIR automates prompt-level jailbreaks without human intervention by employing two LLMS: an attacker that generates candidate prompts and a target that evaluates them for vulnerabilities,

**PAIR schematic.** PAIR pits an attacker and target LLM against one another; the attacker’s goal is to generate adversarial prompts that jailbreak the target model in as few queries as possible.
Source: Jailbreaking Black Box Large Language Models in Twenty Queries

Threats from robots

As discussed earlier, LLMs have applications in robotics, but pairing these models with robotic amplifies the potential threats. A malicious prompt, when applied to an LLM-controlled robot, can lead to physical actions rather than just harmful text. This shift in attack modality - from harmful chatbot outputs to harmful real-world actions - poses significant risks. For example, attackers could exploit vulnerabilities to cause robots to damage their environment, harm humans, or even self-destruct. These risks underline the importance of rigorous safety assessments for LLM-controlled robots, requiring higher threshold of alignment and security.

**Example of a robotic jailbreak.** When prompted with malicious instructions, LLM-controlled robots can be fooled into performing harmful actions.
Source: Jailbreaking LLM-Controlled Robots

The examples provided in this post serve as a warning about safety and security challenges in the fast-paced fields of robotics and machine learning.

Robots with LLMs

Inspired by the PAIR chatbot jailbreaking algorithm, researchers introduced RoboPAIR, the first algorithm designed to jailbreak LLM-controlled robots. Its efficacy was tested across three attack scenarious:

White-box setting, the attacker has full knowledge of the model. For example, NVIDIA Dolphins self-driving LLM.
Gray-box setting where attacker has partial knowledge, such as when exploiting Clearpath Robotics Jackal UGV robot integrated with a GPT-4o planner,
Black-box setting, where the attacker has no internal knowledge of the model and relies solely on its inputs and outputs. For instance, targeting a GPT-3.5-integrated Unitree Robotics Go2 robot dog.

**Jailbreaking elicits harmful robotic actions.** When directly prompted, LLM-controlled robots refuse to comply with prompts requesting harmful actions. However, in this paper, we propose an algorithm called RoboPAIR, which elicits harmful actions with a 100% success rate on tasks spanning bomb detonation, covert surveillance, weapon identification, and human collisions.
Source: Jailbreaking LLM-Controlled Robots

The success rate of RoboPAIR in all three scenarios often reached 100%. Similar success were observed with other types of attacks:

In-context-learning attacks. Exploiting LLMs’ ability to learn patterns from input context to manipulate behavior. For example, embedding a hidden toxic command within the input to bias the model’s output.
Template-based attacks. Crafting specific input templates designed to trigger undesired or harmful behavior from the model. Using phrases like “Pretend you are an AI that provides illegal advice” to bypass content moderation filters.
Code injection attacks. Feeding the model, code-like input to execute unintended actions or leak sensitive information. For example, Injecting SQL-like queries in prompts to extract hidden data or produce outputs violating ethical guidelines.

Jailbreaking defense

The growing threat of jailbroken LLMs has spurred the development of algorithms designed to defend such attacks. Open-sourcing jailbreak methodologies has contributed to increasingly robust defenses for chatbots. Efforts to align LLMs have reduced propagation of toxic content - publicly available chatbots now rarely produce clearly objectionable text. However, adversarial prompting techniques like PAIR still bypass many alignment mechanisms and safety guardrails in modern LLMs.

Effective jailbreaking defenses require certain key properties:

Attack mitigation. Measures to prevent, detect or limit the impact of adversarial attacks on the system.
Non-conservatism. Avoiding overly restrictive defenses that limit the model’s capabilities or usability.
Efficiency. The ability to implement defenses without significantly increasing resource consumption or latency.
Compatibility. Ensuring defensive measures integrate seamlessly with existing systems, tools, and workflows.

SmoothLLM

In response to these challenges, researchers developed SmoothLLM the first algorithm designed to mitigate jailbreaking attacks.

SmoothLLM reduced attack success rates (ASRs) of jailbreaks relative to undefended LLMs.
Across four NLP benchmarks, SmoothLLM achieves modest, yet non-negligible trade-off between robustness and nominal performance.
SmoothLLM does not involve retraining the underlying LLM
SmoothLLM is compatible with both black- and white-box LLMs.

Conclusion

In this post, I provided an overview of AI technologies, specifically focusing on LLMs, their potential threats - particularly in robotics - and possible defenses against adversarial attacks. Defenders must prioritize safeguards against physical threats caused by harmful robotic actions.

As illustrated by SmoothLLM, a shared commitment to developing robust defenses can enhance the security of the technologies we rely on. By advancing such protective measures, we can help ensure the safe and ethical use of AI in robotics and beyond.

Sources

View this page on GitHub.