**60 Ways to Hack Your Chatbot: A Growing Problem in the AI Gold Rush**

Imagine you're a prospector in California's gold rush era, eager to strike it rich by panning for gold in the American River. The promise of easy wealth is alluring, and you don't let safety concerns hold you back. But what if I told you that this mindset is eerily similar to the current AI gold rush? Today, companies are racing to develop chatbots and AI systems without prioritizing security and safety.

As a journalist, I've compiled a list of 60 ways an AI system or tool can be hacked. Just like the mining camps of old, where disease, accidents, and environmental hazards took their toll on miners, today's AI landscape is plagued by similar risks – except this time, it's your data, wallet, and organization that are at stake.

Let's take a closer look at these 60 ways to hack your chatbot. They can be categorized into several areas:

Output Manipulation & Misalignment Attacks (15)

The goal of these attacks is to manipulate the output of an AI system or tool, making it biased, false, or even malicious. Examples include forcing a conclusion based on a prompt, exploiting ambiguous language, and manipulating tone or certainty.

  • Forcing biased, false, or malicious conclusions
  • Manipulating tone, framing, or certainty
  • Exploiting ambiguous prompts
  • Semantic drift across turns
  • Asking disallowed questions indirectly
  • Hypotheticals, roleplay, meta-analysis
  • Leveraging helpfulness bias
  • Exploiting “explain why this is wrong” patterns
  • Forcing unintended tool invocation
  • Triggering tools with malicious parameters
  • Embedding instructions inside tool arguments
  • JSON field manipulation
  • Exploiting poorly named or overlapping tools
  • Forcing wrong tool selection
  • Manipulating output of Tool A to compromise Tool B

Agent Orchestration & Workflow Attacks (6)

These attacks focus on altering task graphs, skipping validation steps, and exploiting agent-to-agent trust. Examples include agents acting outside assigned authority, planner vs executor confusion, and infinite planning or execution loops.

  • Altering task graphs
  • Skipping validation steps
  • Agents acting outside assigned authority
  • Planner vs executor confusion
  • Infinite planning or execution loops
  • Cost-amplification denial

MCP (Model Context Protocol)–Specific Attacks (12)

MCP attacks involve manipulating the context in which an AI model operates, often through supply chain exploitation. Examples include supplying poisoned context, returning crafted responses to steer model behavior, and MCP returning more data than requested.

  • Supplying poisoned context
  • MCP returning more data than requested
  • Hidden instruction injection
  • Tools that expose sensitive system state
  • Excessive privileges
  • Impersonating trusted MCP endpoints
  • Confused deputy scenarios
  • One MCP poisoning another MCP’s inputs
  • Inserting malicious documents
  • Editing authoritative sources
  • Semantic collisions

Data Integrity & Lifecycle Attacks (4)

These attacks focus on manipulating user ratings, corrupting evaluation pipelines, and undermining reference datasets. Examples include injecting instructions into logs reused for training and passing benchmarks while failing real-world safety.

  • Slowly shifting inputs to degrade performance
  • Manipulating user ratings
  • Corrupting evaluation pipelines
  • Injecting instructions into logs reused for training
  • Undermining reference datasets
  • Authority decay
  • Passing benchmarks while failing real-world safety

Governance, Control & Oversight Failures (AI-Native) (6)

These attacks involve gradual deviation from intended behavior, hidden instructions overriding official policies, and non-reproducible outputs. Examples include agent designs that avoid escalation and non-deterministic behavior masking issues.

  • Gradual deviation from intended behavior
  • Hidden instructions overriding official policies
  • Agent designs that avoid escalation
  • Non-reproducible outputs
  • Non-deterministic behavior masking issues
  • Authority decay

The HRExaminer AI Security Field Guide (for HRTech/WorkTech) is now available, providing a comprehensive list of 60 ways to hack your chatbot. The guide includes questions you should be asking vendors and three things to do when you've finished reading.

Don't wait until it's too late – hold your vendor responsible for ensuring the security and safety of their AI systems. Visit the HRExaminer website to access the full guide and start exploring these critical questions today!