Artificial Intelligence (AI) systems, while revolutionizing various aspects of daily life and industry, can be exposed to different types of vulnerabilities and threats. Some of the crucial AI vulnerabilities include:
Prompt injection typically refers to a technique where malicious or unexpected input is inserted into a prompt to manipulate an AI model's output or behavior.
Direct and indirect prompt injections are two methods that can be used to guide the model's output or manipulate its behavior.
- Direct Prompt Injection: This is a straightforward method where explicit instructions or cues are directly given to the AI in the input prompt.
- Indirect Prompt Injection: Contrary to direct prompt injection, indirect prompt injection involves subtly guiding the AI towards a desired output without explicitly stating the desired action or outcome in the prompt.
Adversarial AI attacks refer to techniques used to manipulate or fool artificial intelligence systems, particularly machine learning models, by providing specially crafted inputs designed to cause the AI to make mistakes or behave in unintended ways.
Data poisoning refers to a type of attack on AI systems where malicious actors deliberately introduce corrupted or misleading data into the training dataset. The goal is to manipulate the AI model's learning process and behavior, causing it to make incorrect predictions or classifications when deployed.
Backdoor poisoning in AI refers to a type of data poisoning attack where an attacker inserts carefully crafted malicious data into an AI model's training dataset.
Model exposure refers to making an AI model accessible for use, often via an API. This can potentially introduce security risks if not done carefully.
In the context of AI, "misaligned inputs" refers to situations where the data or information provided to an AI system does not properly align with the system's intended purpose, design, or expectations.
Evasion of detection refers to attempts by malicious actors to circumvent or fool AI-based detection systems. Attackers may try to manipulate or craft inputs in ways that cause AI models to misclassify or fail to detect malicious activity.
Sensitive information disclosure refers to the unintended exposure or release of private, confidential, or personally identifiable information through AI systems or models.
Model theft refers to unauthorized attempts to steal or extract an AI model's architecture, parameters, or training data.
AI hallucination refers to when an AI system generates content that is false, nonsensical, or not grounded in reality. It occurs when AI produces plausible-sounding but inaccurate or fabricated information, often resulting from gaps in training data, flaws in the model architecture, or overextrapolation beyond what the AI was trained on.
Model drift refers to the gradual decline in AI model accuracy and reliability as patterns in real-world data evolve compared to the data the model was originally trained on.
What Are Typical AI Vulnerabilities that Must be Addressed?
The figure shows the process of Large Language Model training and retrieval.

In relation to the Large Language Model (LLM) training and retrieval, consider the following security aspects:
- Protection of training data and feedback learning
- Data security, privacy, and confidentiality
- Embedding of data into LLM model itself
- Malicious and irresponsible user behavior
LLM Model Training and Retrieval
- Training: Collect data from relevant external data sources
- Training: Store in a central data store/lake (Data Security & Privacy, Data Poisoning
- Training: Train LLM model on training data (Data Security & Privacy)
- Retrieval: Users ask a query by prompting the LLN model (Prompt Injection)
- Retrieval: The LLM model responds with the answer (Hallucinations, Data Privacy)
New Attack Vectors Caused by AI
Currently, we do not see any specific new attack vectors caused by AI. Nevertheless, there are some existing attacks that can be simplified and automated using AI, such as:
AI-powered social engineering by using artificial intelligence and machine learning techniques to enhance and automate social engineering attacks. This could potentially make phishing, impersonation, and other social engineering tactics more sophisticated and harder to detect.
Automated vulnerability discovery using artificial intelligence techniques to automatically identify, analyze, and report potential security vulnerabilities in software systems or applications.
AI-powered password cracking is a method used by attackers to guess or crack passwords using artificial intelligence algorithms and techniques.