AI Hallucinations and the Military
“JADC2 intends to enable commanders to make better decisions by collecting data from numerous sensors, processing the data using artificial intelligence algorithms to identify targets, then recommending the optimal weapon… to engage the target,” US Congressional Research Service report (2022).
The understanding that AI hallucinates (in around 14.3% of all responses according to the New Scientist in late 2025), is becoming more widespread. This has done nothing to halt the increase in use of unverified AI and LLM recommendations by military personnel or systems. There continue to be stories where students in higher command and staff training are using LLMs to determine the answers to their research questions, and in writing their own dissertation submissions. At no point -according to those observing – do military officers verify the information they are presented with. These are not one-off incidents; reports from PME institutions across the US and in Europe abound with similar findings. Against a background where non-military further education students are becoming (slowly) more sceptical about anything written by AI or LLMs, military personnel are deliberately ignoring advice to “not-trust” and “verify”.
Hallucinations in AI and LLMs have been the subject of considerable concern to technology companies, who tend to claim that if users were to provide systems with better prompts and better training data for the models the issues would go away. Yet research from Princeton University shows that throwing more (and better) data at AI models, and providing systems with more computing power simply has not helped. In fact, the reverse is true: AI hallucinations are getting worse.
In an academic sense, hallucinations are when a LLM perceives patterns or objects that are non-existent. These – according to scholars – will result in incorrect predictions, false positives, and false negatives. This has serious consequences in some circumstances, like cancer detection and diagnosis. In the military, the consequences of hallucinations maybe just as serious and the effects more widespread. Perhaps targeting a school that was historically a military base.
Mitigating AI and LLM hallucinations is possible, but the time taken to verify and check system responses is increasing decision time; the one thing such models were supposed to grant to users. The underlying issue is a misperception about what AI systems do: most models function with a core role not of processing information to look for trends but prioritise instead the question, “what is the next likely word”. AI and LLM models are not detection and prediction organisms; they are computer models that use any evidence they can find to promote their own answers. LLM models do not have moral or ethical boundaries at their core; these have been added (and are constantly being amended) as an afterthought. Whilst technology companies are increasingly employing those from the social sciences schools to try and teach AI and LLM models such considerations, such factors are not at the core of programming, coding or algorithms.
There is no date at which we can expect hallucinations to decline or disappear. For the military this is important: understanding that such systems in C2 hierarchies are inherently untrustworthy and unreliable would be a good starting point. But there seems to be a failure to accept this, or to insert mitigation processes within military decision-making C2 systems.
The absence of tools and processes to moderate the behaviours of AI systems and LLMs might be simply because, to some, AI might simply be as liable to make errors as humans are. However, unlike human errors it is much more difficult to determining who is responsible for AI errors. Within a military HQ that accepts and acts on a flawed, or hallucinated recommendation from an AI system which ends in the loss of civilian life or infrastructure by error, who will be judged? The commander – per international law – will be held accountable for the effects but who will be responsible? The staff officer, the coder, programmers, prompter, the data provider, the author of the original algorithm, or someone else? What seems likely is that technology companies have no liability for the tools that they create beyond the profits they might forfeit IF a military decided to replace their tools.
The other issue about regarding machine hallucinations as analogous to human error is that in military decision-making trust has been a significant factor in making judgements: knowing the person, their foibles, biases, and ways of thinking is one of the uses of a diverse staff to a commander. The idea of human-AI or human-LLM trust is less well understood, researched, or prepared for. Humans do, however, have a tendency to believe what is convincing (whether true or not): data provided by AI and LLMs appears (to the military mind in particular) as compelling. And when PME students have become so reliant on such tools good judgement, as well as the need for verification and factchecking, have been left far behind.
Leave a Reply