When Generative AI Hurts Your Work Product

A study suggests that OpenAI’s ChatGPT can seem to help creative work, but it can undercut analysis.

The almost endless talk of artificial intelligence by CRE software vendors is frequently dominated by the mention of generative AI programs like ChatGPT from OpenAI, Google’s Bard and Gemini, Microsoft’s Bing Chat, and others.

Trained on vast sets of text material, companies make many promises, how these systems will create marketing materials, communicate with customers, analyze large numbers of potential deals, and more for CRE businesses. However, there are already indications that generative AI can have significant drawbacks.

A new study from researchers and experts at Harvard Business School, the Wharton School, Warwick Business School, MIT Sloan School of Management, and Boston Consulting Group put this into even greater relief. They looked at 758 consultants — about 7% of the consultants at the company — and had them do baseline work on particular tasks. Then the researchers assigned each of the consultants to one of three groups: no AI access, GPT-4 AI access, or GPT-4 AI access with training on how to write the prompts the software uses as instructions.

The results drew a picture of using generative AI as a “jagged technological frontier.” The software seemed to do some tasks easily, while others were outside its ability. “For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed tasks 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group),” they wrote.

The tasks fell into four categories: creativity, analytic thinking, writing proficiency, and persuasiveness.

Those who performed below the average in the baseline tasks improved by 43%. Consultants who performed above average at the start saw an improvement over those scores of 17%.

It was in the creative tasks — product innovation and development — that the biggest boosts were seen. (Although the true proof of success would have to be one proven in the marketplace, not by test judges, as human assumptions about what should work are often completely wrong.)

In the analytic tasks, using the AI system turned out to be a detriment. The performance of those using the software and who received training dropped by 24 percentage points. The group that used the software without instructions also fell, though by only 13 percentage points.

Hila Lifshitz-Assaf, a management professor at Warwick Business School in Britain, told the New York Times about interviews with subjects after the experiments. “People told us they neglected to check because it’s so polished, it looks so right,” a significant problem.