The fast evolution and enterprise adoption of AI has motivated dangerous actors to focus on these programs with higher frequency and class. Many safety leaders acknowledge the significance and urgency of AI safety, however don’t but have processes in place to successfully handle and mitigate rising AI dangers with complete protection of the complete adversarial AI risk panorama.
Sturdy Intelligence (now part of Cisco) and the UK AI Safety Institute partnered with the Nationwide Institute of Requirements and Know-how (NIST) to launch the most recent replace to the Adversarial Machine Studying Taxonomy. This transatlantic partnership aimed to fill this want for a complete adversarial AI risk panorama, whereas creating alignment throughout areas in standardizing an method to understanding and mitigating adversarial AI.
Survey outcomes from the World Cybersecurity Outlook 2025 revealed by the World Financial Discussion board spotlight the hole between AI adoption and preparedness: “Whereas 66% of organizations anticipate AI to have essentially the most important impression on cybersecurity within the yr to return, solely 37% report having processes in place to evaluate the safety of AI instruments earlier than deployment.”
As a way to efficiently mitigate these assaults, it’s crucial that AI and cybersecurity communities are effectively knowledgeable about as we speak’s AI safety challenges. To that finish, we’ve co-authored the 2025 replace to NIST’s taxonomy and terminology of adversarial machine studying.
Let’s take a look at what’s new on this newest replace to the publication, stroll via the taxonomies of assaults and mitigations at a excessive degree, after which briefly replicate on the aim of taxonomies themselves—what are they for, and why are they so helpful?
What’s new?
The earlier iteration of the NIST Adversarial Machine Studying Taxonomy targeted on predictive AI, fashions designed to make correct predictions based mostly on historic information patterns. Particular person adversarial methods have been grouped into three major attacker targets: availability breakdown, integrity violations, and privateness compromise. It additionally included a preliminary AI attacker method panorama for generative AI, fashions that generate new content material based mostly on present information. Generative AI adopted all three adversarial method teams and added misuse violations as an extra class.
Within the newest replace of the taxonomy, we increase on the generative AI adversarial methods and violations part, whereas additionally making certain the predictive AI part stays correct and related to as we speak’s adversarial AI panorama. One of many main updates to the most recent model is the addition of an index of methods and violations firstly of the doc. Not solely does this make the taxonomy simpler to navigate, but it surely permits for a better option to reference methods and violations in exterior references to the taxonomy. This makes the taxonomy a extra sensible useful resource to AI safety practitioners.
Clarifying assaults on Predictive AI fashions
The three attacker targets constant throughout predictive and generative AI sections, are as follows:
- Availability breakdown assaults degrade the efficiency and availability of a mannequin for its customers.
- Integrity violations try to undermine mannequin integrity and generate incorrect outputs.
- Privateness compromises unintended leakage of restricted or proprietary info corresponding to details about the underlying mannequin and coaching information.

Classifying assaults on Generative AI fashions
The generative AI taxonomy inherits the identical three attacker targets as predictive AI—availability, integrity, and privateness—and encapsulates extra particular person methods. There’s a fourth attacker goal distinctive to generative AI: misuse violations. The up to date model of the taxonomy expanded on generative AI adversarial methods to account for essentially the most up-to-date panorama of attacker methods.
Misuse violations repurpose the capabilities of generative AI to additional an adversary’s malicious targets by creating dangerous content material that helps cyber-attack initiatives.
Harms related to misuse violations are meant to supply outputs that would trigger hurt to others. For instance, attackers might use direct prompting assaults to bypass mannequin defenses and produce dangerous or undesirable output.

To attain one or a number of of those objectives, adversaries can leverage quite a lot of methods. The enlargement of the generative AI part highlights attacker methods distinctive to generative AI, corresponding to direct immediate injection, information extraction, and oblique immediate injection. As well as, there’s a completely new arsenal of provide chain assaults. Provide chain assaults should not a violation particular to a mannequin, and subsequently should not included within the above taxonomy diagram.
Provide chain assaults are rooted within the complexity and inherited threat of the AI provide chain. Each element—open-source fashions and third-party information, for instance—can introduce safety points into the complete system.
These may be mitigated with provide chain assurance practices corresponding to vulnerability scanning and validation of datasets.
Direct immediate injection alters the habits of a mannequin via direct enter from an adversary. This may be achieved to create deliberately malicious content material or for delicate information extraction.
Mitigation measures embrace coaching for alignment and deploying a real-time immediate injection detection resolution for added safety.
Oblique immediate injection differs in that adversarial inputs are delivered by way of a third-party channel. This method might help additional a number of targets: manipulation of knowledge, information extraction, unauthorized disclosure, fraud, malware distribution, and extra.
Proposed mitigations assist decrease threat via reinforcement studying from human suggestions, enter filtering, and the usage of an LLM moderator or interpretability-based resolution.
What are taxonomies for, anyhow?
Co-author and Cisco Director of AI & Safety, Hyrum Anderson, put it greatest when he stated that “taxonomies are most clearly necessary to arrange our understanding of assault strategies, capabilities, and targets. Additionally they have a protracted tail impact in enhancing communication and collaboration in a discipline that’s shifting in a short time.”
It’s why Cisco strives to help within the creation and steady enchancment of shared requirements, collaborating with main organizations like NIST and the UK AI Safety Institute.
These assets give us higher psychological fashions for classifying and discussing new methods and capabilities. Consciousness and schooling of those vulnerabilities facilitate the event of extra resilient AI programs and extra knowledgeable requirements and insurance policies.
You possibly can assessment the complete NIST Adversarial Machine Studying Taxonomy and study extra with an entire glossary of key terminology within the full paper.
We’d love to listen to what you assume. Ask a Query, Remark Beneath, and Keep Related with Cisco Safe on social!
Cisco Safety Social Channels
Share: