Why AI security is both less scary and far more terrifying than one might think
Last week a story made the rounds on Twitter that the US Air Force had killed one of its operators in error after handing over an armed drone’s targeting system to artificial intelligence during a tested SEAD mission.
This story — poorly analyzed and amplified by tech/VC influencers on — led media such as Vice’s Motherboard to report that the fears of anyone who watched Terminator 2 were coming true: artificial intelligence is dangerous and going to kill us all.
The problem? This story is completely not true. And for anyone who’s spent time building AI and/or defense systems, it was readily apparent that it was bullshit.
I’ve worked on both in my career with the US Air Force. I was one of the product managers working on components of Lockheed Blackcloud and more recently, I was a part of the Air Force’s Platform One initiative and supported the senior leadership of the Air Force on projects such as Big Bang.
Plainly put, the story didn’t make a ton of sense. Developing and deploying new weapons systems is a massive effort and is never something as simple as “let’s slap AI into a Reaper and see what happens.” Even when those systems are seemingly minor iterations on existing platforms, it typically takes years for a system to get beyond simulations and notional experiments to begin field trials — much less field trials with live munitions.
This could be well seen in the development of the NASAMS, where the NASAMS 3 (which is essentially the NASAMS 2 with a software upgrade, 3 new monitors, and support for firing a new version of the AMRAAM missile) took well north of a decade of work before live fire tests began.
Perhaps more important than having R&D experience, as someone who’s actually worked with developing NLP tech this story seemed extremely off. The colonel quoted in the article (which to its credit is actually a blog post of an aerospace conference and doesn’t contend the “Skynet” style concerns that the tech/VC cognoscenti on Twitter spread) noted that the team rapidly iterated on the reward structure of the AI after the operator was “killed,” lamenting that in response the drone “starts destroying the communication tower the operator used to communicate.”
Ignoring the paperwork and at least brief pause that would likely occur if someone was murdered during weapons tests, it’s incredibly hard to rapidly iterate on the result of an AI model like that if the model is anything beyond a basic simulation.
Tweaking the reward structure and “running it again” on something supposedly live within the field ala a UCAV is the equivalent of saying you decided to “tweak” the engine of a Southwest Airlines jet mid-flight with a “minor” alteration of changing the shape of its fan blades.
That’s not a minor change, and given you’re trying not to commit murder it better be a simulation given doing that in-flight is a great way to *ahem* lose at Microsoft Flight Simulator. In real life.
Stories like these are often the first thing that come to mind when discussing AI security. They also are the most misinformed on what AI security actually entails, and they’re contributing to mischaracterizing very real concerns about the implications of artificial intelligence on information security in a way that hurts both fields of computing.
Ill-informed concerns about AI security being a fight to “stop Skynet” have already impacted the public’s perception of artificial intelligence. This can well be seen in the Future of Life Institute (FLI)’s “Pause Giant AI Experiments letter” — a petition that garnered lots of press for featuring big names in tech like Elon Musk, but was found to later be dubious and misrepresented both the work and intentions of AI scientists it quoted.
This generation of AI is falling prey to an unhealthy fervor of tech hype and bullshit that seems driven by the flight of charlatanism into AI from other “former hype” spaces like crypto and VR. And while artificial intelligence is a real, substantive field whose modern advances will have a significant impact on computing and the global economy, so too is the danger that this kind of “strong opinions, loosely held, even looser researched” style of thought leadership such influencers bring with them.
AI security is at risk from this overhype dynamic. The conversation about what privacy and security look like in a post-ChatGPT age is already one where we need to contend with ensuring a mythical Skynet won’t kill us all. And if the public believes lies about AI security like those contained within FLI’s letter, it could jeopardize funding and support for the field.
We should care about how modern AI changes the dynamic for information security and privacy. To do it properly, we need to understand (at least at a high level) the foundations of modern artificial intelligence.
Simple Tricks and Nonsense
There are a few different ways to train a computer to analyze and discern answers from data — essentially what modern AI is attempting to perform. These typically fall into one of two buckets: NLP and ML.
Machine Learning or ML is the process of using mathematics and algorithms to force a computer to iteratively refine and improve analysis on data it extracts from a source. The algorithms powering most ML essentially “sieve” out an answer, like panning for gold from dirt or straining out spaghetti from a pot of boiling pasta water.
An example of ML would be reinforcement learning, the process billed by the USAF as the notional model for their SEAD program. In reinforcement learning, an agent analyzes a sample of data available called an environment to discern analysis based off of available actions. The engineer developing this ML system can influence how the agent interprets their environment by encoding rewards as responses to those actions.
A common real world example of ML is in how most pet owners teach their pets tricks. As the pet owner commanding a puppy to “sit”, the pet owner instantiates the environment that the puppy (the agent) must respond to. By showing the puppy how to sit and giving them a treat, the owner shows them that there is a positive reward for performing that action.
But if the puppy does not comply after learning the “sit” action and being commanded to do so, the owner does not give them a treat and thus does gives them a reward of zero. After sufficient rounds of their owner telling them to sit and potentially providing a corresponding treat (also known as training data) the puppy learns to positively associate “sit” with their treat reward and responds accordingly. The learning, experience, and resulting behavior to sit in response to the command and training is what in AI would typically be referred to as a model.
This example well highlights why simply changing a reward mid-training is significant and was a red flag for the colonel’s comments in the USAF’s drone post. If instead of giving a puppy a treat for “sit” you abruptly decided to spray him with water, it would likely produce an incredibly different output in the resulting model.
If the model was intended to create behavior for a drone to detect and engage RADAR sites for running SEAD missions, it likely is significantly more complex than the practice of teaching a puppy to sit. Changing the reward structure for the set of actions for an actor drone to detect and engage a target would completely destablize a model mid-training and almost certainly be unacceptable given the strict nature of most military testing.
In code, reinforcement learning typically takes the form of neural networks — abstract data structures that serve to layer the decision and reward process in a way that large amounts of training data can be pushed through to help build build a model. These neural networks are powered by common mathematics likely familiar to most computer science graduates: Markov chains and graph theoretic algorithms such as Dijkstra’s Search being common examples.
Like most of modern AI, the math powering ML is far from novel. What’s novel is the computing power we have to wield, as well as the wealth of data we have available for common tasks.
Natural Language Processing (NLP)
Natural Language Processing or NLP is the practice of replicating how human beings communicate via language in order to discern information from input data.
Unlike ML the information isn’t sieved from the input. Instead, the system uses information it already has about the syntax and grammar of a language to discern assertions from a statement, perform analysis on that assertion, and respond pragmatically to the input.
NLP aggressively borrows from its namesake areas of neuroscience and linguistics for the framework it uses to perform these operations. It splits them up into roughly four key types of analysis:
- Lexical Analysis: The parsing of an input into serviceable and analyzable data that fits into a known language. Often this means smoothing and splitting data into tokens — smaller, isolated atomic points that can be analyzed for syntax. Frequently this action is called pre-processing.
- Syntactic Analysis: Given a known language, use knowledge of how that language works and its rules (i.e.: its grammar) to analyze the tokens generated in lexical analysis. The result of syntactic analysis should be something that a computer can easily map to known fact for further analysis — often a data structure like a parse tree.
- Semantic Analysis: Now that we have parsed, intelligeable data from the input, further filter that data in such a way that we can use previously-learned facts/findings from other data to draw conclusions.
- Pragmatic Analysis: Draw conclusions by looking at statistical or logical relevance between the semantics of input data and previously discovered information. Make resulting assertions and take action based on these findings.
For example, if I was to say the phrase “the cat listens to melodic techno,” parsing and analyzing this statement using NLP would take the form of the following:
- Lexical Analysis: Given this is the English language, we would split every token on a space and prepare it for syntactic analysis.
- Syntactic Analysis: Given our knowledge of the English language, we would filter each token from the input into its different types of words — nouns, verbs, articles, etc. For analysis’ sake, we would spit this all out in a parse tree.
- Semantic Analysis: Given our knowledge of English grammar we would go through the parse tree and establish nouns, verbs, adjectives, and the relationships known therein. We would learn the cat is the subject of the sentence, the predicate is that the cat is listening to melodic techno, melodic techno is the direct object of the cat’s listening, and melodic is an adjective for techno.
- Pragmatic Analysis: Given our grammatical parsing of the semantics of the input, we would realize that melodic techno is some kind of sound given that the cat is listening to it. We might even have heard the term techno before, realizing it is a subset of music and that surely melodic techno is a subset of techno and thus music. We can then take action to update our knowledge accordingly from this information if we find it credible — adding for example knowledge that melodic techno is some kind of sub-genre of techno and that the cat listens to it.
ChatGPT: ML and NLP in Modern AI
ChatGPT is perhaps the most famous AI tool on the planet. It also serves as a great representation of where and how modern artificial intelligence suites will be constructed, and where/how ML and NLP accordingly get deployed
ChatGPT is a generative pre-trained transformer, which means it essentially is doing the following:
Step 1: Given an English language input, break apart a request into something like a “complete this sentence” problem.
Step 2: Given this complete this sentence problem, complete it using a large neural network of data that is manicured through training.
Step 1 is essentially NLP, and serves to gather important information about the intention of a request that could then be used by the underlying Large Language Model or LLM backing ChatGPT for analysis.
Step 2 is where the magic happens. Here ChatGPT uses its backing LLM as the machine learning framework to solve the complete this sentence problem.
In the case of GPT-3.5, the LLM that first shipped with ChatGPT, this is a neural network that previously has been given sufficient training to service the foundations of common types of requests — essentially giving a good starting point for ChatGPT to service requests.
This pre-training means the LLM has already been sculpted (through billions of previous actions wherein a computer program and a physical person have reviewed different types of requests and constructed reward structures to guide the computer towards different types of outputs/sub-problems to solve) to ensure that the complete the sentence problem will ideally match the desired format of the request.
Essentially, OpenAI has spent massive amounts of time and resources to ensure that when I ask ChatGPT a question like “what does Marcellus Wallace look like,” that ChatGPT “knows” to return a token in a format fitting the description of a person or thing to successfully “complete the sentence.” Given the wealth of work and resources spent on this training, ChatGPT gets this formatting problem right more often than not. And often it seems downright eerie as a result.
What ChatGPT does not get right is the veracity of that output. While users can help score and give feedback to whether ChatGPT’s results are accurate, they cannot effectively “re-train” the model to make it absolutely correct. That’s because ChatGPT isn’t trying to find truth — it’s returning the statistically most likely solution to that sentence completion problem.
Therein lie glorious, frightening opportunities for hackers.
Adversarial Analsyis on Modern AI
Whether we’re talking about GPT-4 powering a future version of Chat-GPT or the NLP engine powering Wolfram Alpha’s ability to split up an English request with an in-line math problem, modern AI systems run like any other large scale system: as a series of interconnected services on top of common computing infrastructure.
Modern hackers — and more importantly their tooling — are often very good at hacking these kinds of systems.
Example: A Facebook Messenger Chatbot
To illustrate some of these challenges, let’s consider an example chatbot system’s infrastructure. I’ll use Varun’s tutorial from his blog as an example, as it well highlights the components for infrastructure that are needed to perform ML analysis on input data.
This is an AI chatbot whose goal is to respond to Facebook messages. It uses the following to do so:
- Facebook Messenger API Service: This is the first front-end application that serves to derive input from the user via Facebook’s API. A service exists to integrate this API with Dialogflow.
- Dialogflow API Service and Event Handler: Dialogflow is a NLP platform for extracting pragmatic analysis from human language input and outputting that analysis in a way that a user can encode progamatic logic and responses. Those responses (i.e.: what “happens”) is considered a fulfillment.
- Python/Flask Fulfillment Infrastructure: In order to respond to a fulfillment request, an interconnected web of applications and services are deployed within Flask to provide responses given specific intents. For example, if this chatbot app was a service to respond to a bank customer’s requests, one of those intents might be to check the balance on their account. A process to perform the resulting API call to do so would be codified and launched to provide the fulfillment request with data likely brought in from another external service (e.g.: the result of a Stripe call to request account balance).
- Database: Stateful data to respond to different intents, as well as all of the API keys and other data necessary for hosting interaction between API services like Facebook Messenger and Dialogflow, need to sit somewhere. In this example it’s a mongoDB instance.
- Networking: Varun uses ngrok here to provide communication between the services and databases hosting this infrastructure.
Hacking the Chatbot
If I were trying to hack this infrastructure, I’d do what most modern adversaries do: exploit its complexity and/or compromise its data stores.
Ngrok, Dialogflow, Facebook API — there are a lot of hosted services necessary to keep this system running. These hosted services require some kind of secret to allow services to interact with their APIs, often some kind of cryptographic key or identifier that is used to perform PKI in order to verify the service’s identity as being a legitimate component of the Chatbot.
If I was trying to compromise this system, I would likely start in trying to steal one of these keys. Without any kind of cryptography or a secrets management system, these keys appear to simply be stored within its central MongoDB database.
- Use Recon-ng to prospect the infrastructure of the app that’s publicly connected to the internet, and search for known vulnerable components of its web frontend and services infrastructure via corresponding modules.
- Use Metasploit to construct an attack that exploits those vulnerable components to drop malware onto host(s) I could use to eavesdrop on communication between the host launching/managing services and the database for secrets. That eavesdropping likely would take the form of memory dumps from a privileged user.
- If I can’t find sufficient secrets from communication within the host, directly attack the MongoDB database itself using something similar to the above.
Successfully hacking the services’ hosts, in lieu of a secrets manager and sufficient hardening of services’ hosts, could allow me to steal credentials that may allow me outsized access beyond the Chatbot. For example, the GCP key used for interacting with Dialogflow may give me access to other applications and services run by the Chatbot’s developer.
Perhaps more disconcerting, this also could allow me to covertly modify aspects of the Chatbot. For example, I could use my compromise to take plaintext input from intent processors to perform identity theft if the user-submitted data to that intent processor is sufficiently sensitive (e.g.: a social security number, an account routing number, etc.)
Hacking the Host
The Chatbot example well highlights some of the serious security challenges in modern AI applications. Like most modern applications, the reliance on an external host and analysis of sensitive data between the external host and the local app raises the potential for an adversary to steal sensitive secrets.
But what if you are that external host? What if you’re Dialogflow or ChatGPT? What if you’re hosting the NLP or ML service that an AI application communicates with for analysis? If you’re hosting the LLM service for AI applications, then the challenges of defending services’ host and database infrastructure expands by orders of magnitude.
Much of this has to do with complexity; rather than simply managing one or a couple of MongoDB instances, you’re potentially managing thousands of databases with a a massive data lake you must harden while simultaneously ensuring is freely and performantly accessible at scale.
But there is also a very significant supply chain security concern with being a LLM host. For an enterprising adversary, attacking the pre-training process for a LLM may provide you an opportunity to inject exploitable vulnerabilities or purposeful faults in analysis that are difficult if not possible to detect once the model is running.
One way to inject such faults is via statistical attack. If an adversary can access the ML systems training a model, they can clandestinely insert faults into either the reward process or sequential training output to introduce faulty data responses or potentially even vulnerabilities that could be exploited (e.g.: a forcing a ML system that performs mathematics in response to a prompt into introducing some kind of buffer stack overflow).
While this is incredibly hard to perform, attacks like these would also be incredibly hard to stop given they would intimate retraining the entire model. Perhaps worse, there are no common industry frameworks or models to even thinking about defending a LLM against these kinds of “training injection attacks.”
All of this means that running your own NLP or ML analysis service at scale is tantamount to being something like a cloud provider. Being an OpenAI has the potential to be something almost like a new AWS. But with that opportunity comes potentially massive security risks.
The next generation in computing will involve artificial intelligence. And just as it will bring with it incredible economic and computational possibilities, so too will it bring significant security concerns: principally those around protecting sensitive information and secrets.
As we move forward into a world where AI applications use NLP and ML services such as hosted LLMs and systems like ChatGPT, we need to iterate on our existing security frameworks for secrets management, supply chain security, and network/infrastructure security as a whole.
Skynet isn’t imminent. But a new, horrifying generation of data breaches and identity theft may be if we don’t address these security concerns with modern AI infrastructure.