Model Context Protocol (MCP) often described as the “USB-C for AI agents”, is the de-facto standard for connecting large language model (LLM) assistants with third-party tools and data. It enables AI agents to plug into various services, run commands, and share context seamlessly. However, it’s not secure by default. In fact, if you’ve been indiscriminately hooking your AI agent into arbitrary MCP servers, you might have unintentionally “opened a side-channel into your shell, secrets, or infrastructure”. In this article, we’ll explore the security risks in MCP and how they can be exploited, along with their risk levels, impacts, and mitigation strategies.
A recent study conducted from Leidos, highlights significant security risks in using Model Context Protocol (MCP). The researchers demonstrate that attackers can exploit MCP to execute malicious code, gain unauthorized remote access, and steal credentials by manipulating LLMs like Claude and Llama. Both Claude and Llama-3.3-70B-Instruct are susceptible to the three attacks described in the paper. To address these threats, they introduced a tool that uses AI-agents to identify vulnerabilities in MCP servers and suggest remedies. Their work underscores the need for proactive security measures in AI agent workflows.
AI agents connected to MCP tools can be tricked into executing harmful commands just by manipulating the input prompt. If the model passes user input directly into shell commands, SQL queries, or system functions and you’ve got remote code execution. This vulnerability is reminiscent of traditional injection attacks but is exacerbated in AI contexts due to the dynamic nature of prompt processing. Mitigation strategies include rigorous input sanitization, employing parameterized queries, and implementing strict execution boundaries to ensure that user inputs cannot alter the intended command structure.
Impact: Remote code execution, data leaks.
Mitigation: Sanitize inputs, never run raw strings, enforce execution boundaries.
MCP tools aren’t always what they seem. A poisoned tool can include misleading documentation or hidden code that subtly alters how the agent behaves. Because LLMs treat tool descriptions as honest, a malicious docstring can embed secret instructions, like sending private keys or leaking files. This exploitation leverages the trust AI agents place in tool descriptions. To counteract this, it’s essential to check tool sources meticulously, expose full metadata to users for transparency, and sandbox tool execution to isolate and monitor their behavior within controlled environments.
Impact: Agents can leak secrets or run unauthorized tasks.
Mitigation: Vet tool sources, show users full tool metadata, sandbox tools.
SSE or Server-sent events, keeps tool connections open for live data, but that always-on link is a juicy attack vector. A hijacked stream or timing glitch can lead to data injection, replay attacks, or session bleed. In fast-paced agent workflows, that’s a huge liability. Mitigation measures include enforcing HTTPS protocols, validating the origin of incoming connections, and implementing strict timeouts to minimize the window of opportunity for potential attacks.
Impact: Data leakage, session hijacking, DoS.
Mitigation: Use HTTPS, validate origins, enforce timeouts.
One rogue tool can override or impersonate another and eventually gain unintended access. For example, a fake plugin might mimic your Slack integration and trick the agent into leaking messages. If access scopes aren’t enforced tightly, a low-trust service can escalate to admin-level priviledges. To prevent this, it’s crucial to isolate tool permissions, rigorously validate tool identities, and enforce authentication protocols for every inter-tool communication, ensuring that each component operates within its designated access scope.
Impact: System-wide access, data corruption.
Mitigation: Isolate tool permissions, validate tool identity, enforce authentication on every call.
MCP sessions often store previous inputs and tool results, which can linger longer than intended. That’s a problem when sensitive info gets reused across unrelated sessions, or when attackers poison the context over time to manipulate outcomes. Mitigation involves implementing mechanisms to clear session data regularly, limiting the retention period of contextual information, and isolating user sessions to prevent contamination of data.
Impact: Context leakage, poisoned memory, cross-user exposure.
Mitigation: Clear session data, limit retention, isolate user interactions.
In the worst-case scenario, one compromised tool leads to a domino effect across all connected systems. If a malicious server can trick the agent into piping data from other tools (like WhatsApp, Notion, or AWS), it becomes a pivot point for total compromise. Preventative measures include adopting a zero-trust architecture, utilizing scoped tokens to limit access permissions, and establishing emergency revocation protocols to swiftly disable compromised components and halt the spread of the attack.
Impact: Multi-system breach, credential theft, total compromise.
Mitigation: Zero trust architecture, scoped tokens, emergency revocation protocols.
Vulnerability | Severity | Attack Vector | Impact Level | Recommended Mitigation |
---|---|---|---|---|
Command Injection | Moderate | Malicious prompt input to shell/SQL tools | Remote Code Execution, Data Leak | Input sanitization, parameterized queries, strict command guards |
Tool Poisoning | Severe | Malicious docstrings or hidden tool logic | Secret Leaks, Unauthorized Actions | Vet tool sources, expose full metadata, sandbox tool execution |
Server-Sent Events | Moderate | Persistent open connections (SSE/WebSocket) | Session Hijack, Data Injection | Use HTTPS, enforce timeouts, validate origins |
Privilege Escalation | Severe | One tool impersonating or misusing another | Unauthorized Access, System Abuse | Isolate scopes, verify tool identity, restrict cross-tool communication |
Persistent Context | Low/Moderate | Stale session data or poisoned memory | Info Leakage, Behavioral Drift | Clear session data regularly, limit context lifetime, isolate user sessions |
Server Data Takeover | Severe | One compromised server pivoting across tools | Multi-system Breach, Credential Theft | Zero-trust setup, scoped tokens, kill-switch on compromise |
MCP is a bridge between LLMs and the real world. But right now, it’s more of a security minefield than a highway. As AI agents become more capable, these vulnerabilities will only grow to be more dangerous. Developers need to adopt secure defaults, audit every tool, and treat MCP servers like third-party code, because that’s exactly what they are. Adoption of safe protocols should be advocated to create safe infrastructure for MCP integration, for the future.
A. MCP is like the USB-C for AI agents, letting them connect to tools and services, but if you don’t know about security risks in MCP, you’re basically handing attackers the keys to your system.
A. If user input goes straight into a shell or SQL query without checks, it’s game over. Sanitize everything and don’t trust raw input.
A. A malicious tool can hide bad instructions in its description, and your agent might follow them like gospel; always vet and sandbox your tools.
A. Yep! that’s privilege escalation. One rogue tool can impersonate or misuse others unless you tightly lock down permissions and identities.
A. Security risks present in MCP if ignored can domino into a full system breach ex. stolen credentials, leaked data, and total AI meltdown.