31 Comments
User's avatar
Kim Nelson's avatar

Your persistence in this experiment offered your readers an opportunity to better understand the inherent biases and limitations of AIs. Then I wondered this -- a large number of us deny our own biases so will we be able to apprehend their existence in a tool? So many rabbit holes!

X. P. Callahan's avatar

It definitely was a slog. I questioned my sanity when I gave the chatbots a reading-comprehension test. But by then my irritation had become personal.

Rebecca Weil's avatar

Kim- yes, our own biases, known or unknown- so interesting.

Sabian Raine's avatar

And without brilliant analysis and original writing such as you offer here, we risk losing ourselves in the mirror!

Maureen Doallas's avatar

Thank you for this, X.P.

Your experiment confirms for me that chatbots, at least at this point, are not a substitute for the human brain's ability to review, synthesize, and analyze information based on real life experience and to then come to conclusions. Chatbots have no real life experience, and they're looking for patterns. Whatever biases there are in the information the chatbots are fed are going to be reflected back in the responses they give. That's not critical thinking.

X. P. Callahan's avatar

I was struck by their failure to process irony and the dramatic structure of a narrative, incuding and especially its silences.

Maureen Doallas's avatar

What in the information they're fed would have "taught" them to recognize those things? They're didactic.

The current conversations around chatbots remind me of the electronics in my car, which will flash a statement if I go over solid double lines on a highway but have no ability to discern that I'm doing that because of an accident in my lane. And the problem there is that my car insurance rates could be affected if the information were going back to the insurer who might deduce I was a poor driver.

Textual analysis requires a skill chatbooks don't have at this point.

X. P. Callahan's avatar

Have you read this?

https://web.archive.org/web/20260604061857/https://www.theatlantic.com/philosophy/2026/06/no-artificial-intelligence-is-not-conscious/687378/

It includes this memorable remark: “Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious.”

Maureen Doallas's avatar

No but I will now. Thank you.

Maureen Doallas's avatar

Good article. I'm with Ted Chiang.

Janie Braverman's avatar

I would never pass up the opportunity to watch you think. Even in the context of it being a slog. Makes me want to electronically slap the chatbots,

X. P. Callahan's avatar

Ha ha ha ha! They have a noticeably passive-aggressive tone. They’re trained to be agreeable, and so they never directly contradict you. Instead, they “gently push back.” Also, when I first gave them the word “dyke,” they got all schoolmarmish and “gently pushed back” at me about harmful stereotypes.

Janie Braverman's avatar

Blegh! Now I really want to rap their little e-knuckles!

Mario Fonseca's avatar

Rare Earths Heavy Metals Carbon Valley Aggregate Water Dumb Authorship

Autumn Widdoes's avatar

This is excellent. I suspect many people will read it the same way as the chatbots have. Language has been flattened. There is no room for nuance or depth. Thanks for this.

Mary Roblyn's avatar

Fascinating, X. P. I wonder if many high school seniors, given an essay test, might interpret it similarly to the LLMs. Do our schools prepare students for the nuanced critical thinking required to interpret poetry? I think not. Do the exam-takers calculate answers based on an idea of what the exam-givers want to hear? Sounds familiar.

Excellent, thoughtful work.

X. P. Callahan's avatar

Thank you, Mary. I don’t know how poetry is being taught now in high school. If really at all.

Mary Holscher's avatar

I loved reading this! Thank you got your experiment and clarity. I’m a fan of Perplexity, but would never consider using it for something like this experiment. I’m glad you stuck with it through the reading comprehension part, despite the slog.

X. P. Callahan's avatar

Thank you, Mary! Thanks also for your emailed digest of the various chatbots’ personalities, as it were.

Glenn Ingersoll's avatar

I have avoided AI. But last week I finally visited ChatGPT and gave it a prompt. “Write a poem in the style of Glenn Ingersoll.” The result was not totally unlike something I might have written, but, no surprise, it contained stuff not me. “Nothing remarkable happened.” That line made me laugh, so I remembered it. I did not save ChatGPT’s Ingersoll poem.

Your experiment in coaxing interpretation from the bots reminded me of AI-generated descriptions of my own writing. I didn’t ask for the AI takes, they just show up in search engine results. Sometimes seemingly insightful. But there’s always an offness. Even the offness is stated with such confidence.

X. P. Callahan's avatar

About three years ago, I asked a chatbot to write a poem in the style of X. P. Callahan. Here is what it coughed up:

I woke up to a snow-covered morning

The sun was a pale disc in the sky

I put on my boots and my coat

And walked to the nearest coffee shop

I ordered a latte and a croissant

And sat by the window, watching the cars

I opened my laptop and checked my email

There was a rejection from a literary magazine

I felt a pang of disappointment

But also a surge of defiance

I closed my laptop and took out my notebook

And wrote a new poem, better than the last

I finished my coffee and my croissant

And paid the bill with a smile

I walked back to my apartment

With a spring in my step

I was a poet, and nothing could stop me

Not the snow, not the sun, not the rejection

I had words, and they were powerful

They were my diary, and they were my life

pamm hanson's avatar

Ha ha ha. !!!!!!

rena's avatar

Fascinating. Thank you for sharing your findings. Reading again.

pamm hanson's avatar

I applaud this investigation. Better than my impulse, which is to hide my head in the sand, even as AI insists itself into my life.

James Maynard's avatar

I've been doing a lot of work with LLMs for my salary, and it's given me some insight into what is going on when prompts are sent back to the models, and I wanted to add this perspective to the chat. I write this with trepidation, X.P., because you're so thorough a researcher I'm afraid you already know this, and I'm not adding anything new. Still, I'm hoping it adjusts people's thinking about LLMs and exactly what the limitation is.

To outrage a computer scientist with a simplified statement: LLMs are very very complicated probability machines. When a prompt is sent to the model it undergoes a disintegration into what the _probability_ is for what the prompt is asking, word for word. This is placed into vector coordinates that graph out in mathematics what is the best guess for what you are asking, and once the model has that best guess (a range from 0 to 1), it goes into processing a best guess response, also a probability.

So there's a high probability the model can answer "what is 1+2" correctly. This also shows why Q's 2-6 were answered correctly. But once Q7 is asked, the model is only evaluating 0-1 what is a "correct" answer and will fall back to cliches, not showing intelligent thinking but rather rote programmatic responses. My point being, they can't think intelligently anyway, only on what is the best _probable_ response.

X. P. Callahan's avatar

Thanks, James. I appreciate your perspective. Would it be fair to say that this is a highly technical elucidation of GIGO?

James Maynard's avatar

I think that's fair. Except your prompt wasn't garbage. The responses show we should leave critical thinking to the meat sacks.

X. P. Callahan's avatar

Thanks. But by GIGO, I meant training bias.

James Maynard's avatar

Whoops. I think bad prompts often lead to bad responses. Idk enough about training the models, but GIGO is probable given these companies are training LARGE LMs. A small language model may be able to do it better