Beware Hallucinating Generative AI — Which You May Be Using Now

Although vendors claim they’re improving it, the bug is still widespread.

Back in the day, when someone came up with an outrageous statement, others might say, “What are you smoking/?” These days, the question might be, “Is that smoke I see hovering over your AI servers?”

There is a lot that’s positively amazing about generative AI, especially all the large language model chatbots. Ask something, they look it up. Request a draft of a document and it’s yours faster than you could tap a single letter on a keyboard.

AI holds promise for various areas of commercial real estate, although that largely is still the future as “actual CRE uses appear to be limited and most mundane to date,” a PwC and Urban Land Institute report said in the fall. Also, the term encompasses many different technologies since the 1950s.

Sticking to the LLMs, use is quickly expanding. But in addition to copyright issues, there is the deep problem of hallucination. A recent report from Stanford found that “legal mistakes with large language models are pervasive.”

“In a new preprint study by Stanford RegLab and Institute for Human-Centered AI researchers, we demonstrate that legal hallucinations are pervasive and disturbing: hallucination rates range from 69% to 88% in response to specific legal queries for state-of-the-art language models,” the report said. “Moreover, these models often lack self-awareness about their errors and tend to reinforce incorrect legal assumptions and beliefs. These findings raise significant concerns about the reliability of LLMs in legal contexts, underscoring the importance of careful, supervised integration of these AI technologies into legal practice.”

The rates of hallucination were “alarmingly high for a wide range of verifiable facts.” The more complex the legal question, the more hallucination. Ask about the core ruling or holding of a court and the rate was about 75% across models. Question precedential relationships between cases and most LLMs delivered nothing better than random guesses.

This is just in one area. The question of hallucination has been an issue almost from the moment the earlier versions of the software — technically the non-state-of-the-art — were first made available for use. Simple math that was wrong, references to imaginary publications and studies, assertions limited by lack of up-to-date information. It seems likely that, without adequate exposure to any specialized information and a lot of aid in training, the chances of guesses and off-the-wall responses would also be high.

Think about it for a moment. Is there any human or organization in CRE, or any industry, where a 50-50 guess as to facts and basic analysis would be adequate? Of course not. Even if error rates dropped to 30% or 20%, that would still be enough for someone to be reasonably fired from a position.

CRE companies need to be careful when they adapt these new technologies. Ask the hard questions about where the technical expertise is. Run scenarios whose answers you know to check accuracy. Speed and saving money are great, but not at the price of being wrong, wrong, and wrong again. The new systems won’t go away, and they will get better. Be patient.