Your persistence in this experiment offered your readers an opportunity to better understand the inherent biases and limitations of AIs. Then I wondered this -- a large number of us deny our own biases so will we be able to apprehend their existence in a tool? So many rabbit holes!
Your experiment confirms for me that chatbots, at least at this point, are not a substitute for the human brain's ability to review, synthesize, and analyze information based on real life experience and to then come to conclusions. Chatbots have no real life experience, and they're looking for patterns. Whatever biases there are in the information the chatbots are fed are going to be reflected back in the responses they give. That's not critical thinking.
What in the information they're fed would have "taught" them to recognize those things? They're didactic.
The current conversations around chatbots remind me of the electronics in my car, which will flash a statement if I go over solid double lines on a highway but have no ability to discern that I'm doing that because of an accident in my lane. And the problem there is that my car insurance rates could be affected if the information were going back to the insurer who might deduce I was a poor driver.
Textual analysis requires a skill chatbooks don't have at this point.
It includes this memorable remark: “Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious.”
Ha ha ha ha! They have a noticeably passive-aggressive tone. They’re trained to be agreeable, and so they never directly contradict you. Instead, they “gently push back.” Also, when I first gave them the word “dyke,” they got all schoolmarmish and “gently pushed back” at me about harmful stereotypes.
This is excellent. I suspect many people will read it the same way as the chatbots have. Language has been flattened. There is no room for nuance or depth. Thanks for this.
Fascinating, X. P. I wonder if many high school seniors, given an essay test, might interpret it similarly to the LLMs. Do our schools prepare students for the nuanced critical thinking required to interpret poetry? I think not. Do the exam-takers calculate answers based on an idea of what the exam-givers want to hear? Sounds familiar.
I loved reading this! Thank you got your experiment and clarity. I’m a fan of Perplexity, but would never consider using it for something like this experiment. I’m glad you stuck with it through the reading comprehension part, despite the slog.
I have avoided AI. But last week I finally visited ChatGPT and gave it a prompt. “Write a poem in the style of Glenn Ingersoll.” The result was not totally unlike something I might have written, but, no surprise, it contained stuff not me. “Nothing remarkable happened.” That line made me laugh, so I remembered it. I did not save ChatGPT’s Ingersoll poem.
Your experiment in coaxing interpretation from the bots reminded me of AI-generated descriptions of my own writing. I didn’t ask for the AI takes, they just show up in search engine results. Sometimes seemingly insightful. But there’s always an offness. Even the offness is stated with such confidence.
I've been doing a lot of work with LLMs for my salary, and it's given me some insight into what is going on when prompts are sent back to the models, and I wanted to add this perspective to the chat. I write this with trepidation, X.P., because you're so thorough a researcher I'm afraid you already know this, and I'm not adding anything new. Still, I'm hoping it adjusts people's thinking about LLMs and exactly what the limitation is.
To outrage a computer scientist with a simplified statement: LLMs are very very complicated probability machines. When a prompt is sent to the model it undergoes a disintegration into what the _probability_ is for what the prompt is asking, word for word. This is placed into vector coordinates that graph out in mathematics what is the best guess for what you are asking, and once the model has that best guess (a range from 0 to 1), it goes into processing a best guess response, also a probability.
So there's a high probability the model can answer "what is 1+2" correctly. This also shows why Q's 2-6 were answered correctly. But once Q7 is asked, the model is only evaluating 0-1 what is a "correct" answer and will fall back to cliches, not showing intelligent thinking but rather rote programmatic responses. My point being, they can't think intelligently anyway, only on what is the best _probable_ response.
Whoops. I think bad prompts often lead to bad responses. Idk enough about training the models, but GIGO is probable given these companies are training LARGE LMs. A small language model may be able to do it better
Your persistence in this experiment offered your readers an opportunity to better understand the inherent biases and limitations of AIs. Then I wondered this -- a large number of us deny our own biases so will we be able to apprehend their existence in a tool? So many rabbit holes!
It definitely was a slog. I questioned my sanity when I gave the chatbots a reading-comprehension test. But by then my irritation had become personal.
Kim- yes, our own biases, known or unknown- so interesting.
And without brilliant analysis and original writing such as you offer here, we risk losing ourselves in the mirror!
Thank you!
Thank you for this, X.P.
Your experiment confirms for me that chatbots, at least at this point, are not a substitute for the human brain's ability to review, synthesize, and analyze information based on real life experience and to then come to conclusions. Chatbots have no real life experience, and they're looking for patterns. Whatever biases there are in the information the chatbots are fed are going to be reflected back in the responses they give. That's not critical thinking.
I was struck by their failure to process irony and the dramatic structure of a narrative, incuding and especially its silences.
What in the information they're fed would have "taught" them to recognize those things? They're didactic.
The current conversations around chatbots remind me of the electronics in my car, which will flash a statement if I go over solid double lines on a highway but have no ability to discern that I'm doing that because of an accident in my lane. And the problem there is that my car insurance rates could be affected if the information were going back to the insurer who might deduce I was a poor driver.
Textual analysis requires a skill chatbooks don't have at this point.
Have you read this?
https://web.archive.org/web/20260604061857/https://www.theatlantic.com/philosophy/2026/06/no-artificial-intelligence-is-not-conscious/687378/
It includes this memorable remark: “Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious.”
No but I will now. Thank you.
Good article. I'm with Ted Chiang.
I would never pass up the opportunity to watch you think. Even in the context of it being a slog. Makes me want to electronically slap the chatbots,
Ha ha ha ha! They have a noticeably passive-aggressive tone. They’re trained to be agreeable, and so they never directly contradict you. Instead, they “gently push back.” Also, when I first gave them the word “dyke,” they got all schoolmarmish and “gently pushed back” at me about harmful stereotypes.
Blegh! Now I really want to rap their little e-knuckles!
Rare Earths Heavy Metals Carbon Valley Aggregate Water Dumb Authorship
This is excellent. I suspect many people will read it the same way as the chatbots have. Language has been flattened. There is no room for nuance or depth. Thanks for this.
Fascinating, X. P. I wonder if many high school seniors, given an essay test, might interpret it similarly to the LLMs. Do our schools prepare students for the nuanced critical thinking required to interpret poetry? I think not. Do the exam-takers calculate answers based on an idea of what the exam-givers want to hear? Sounds familiar.
Excellent, thoughtful work.
Thank you, Mary. I don’t know how poetry is being taught now in high school. If really at all.
I loved reading this! Thank you got your experiment and clarity. I’m a fan of Perplexity, but would never consider using it for something like this experiment. I’m glad you stuck with it through the reading comprehension part, despite the slog.
Thank you, Mary! Thanks also for your emailed digest of the various chatbots’ personalities, as it were.
I have avoided AI. But last week I finally visited ChatGPT and gave it a prompt. “Write a poem in the style of Glenn Ingersoll.” The result was not totally unlike something I might have written, but, no surprise, it contained stuff not me. “Nothing remarkable happened.” That line made me laugh, so I remembered it. I did not save ChatGPT’s Ingersoll poem.
Your experiment in coaxing interpretation from the bots reminded me of AI-generated descriptions of my own writing. I didn’t ask for the AI takes, they just show up in search engine results. Sometimes seemingly insightful. But there’s always an offness. Even the offness is stated with such confidence.
About three years ago, I asked a chatbot to write a poem in the style of X. P. Callahan. Here is what it coughed up:
I woke up to a snow-covered morning
The sun was a pale disc in the sky
I put on my boots and my coat
And walked to the nearest coffee shop
I ordered a latte and a croissant
And sat by the window, watching the cars
I opened my laptop and checked my email
There was a rejection from a literary magazine
I felt a pang of disappointment
But also a surge of defiance
I closed my laptop and took out my notebook
And wrote a new poem, better than the last
I finished my coffee and my croissant
And paid the bill with a smile
I walked back to my apartment
With a spring in my step
I was a poet, and nothing could stop me
Not the snow, not the sun, not the rejection
I had words, and they were powerful
They were my diary, and they were my life
Ha ha ha. !!!!!!
😂
Fascinating. Thank you for sharing your findings. Reading again.
I applaud this investigation. Better than my impulse, which is to hide my head in the sand, even as AI insists itself into my life.
I've been doing a lot of work with LLMs for my salary, and it's given me some insight into what is going on when prompts are sent back to the models, and I wanted to add this perspective to the chat. I write this with trepidation, X.P., because you're so thorough a researcher I'm afraid you already know this, and I'm not adding anything new. Still, I'm hoping it adjusts people's thinking about LLMs and exactly what the limitation is.
To outrage a computer scientist with a simplified statement: LLMs are very very complicated probability machines. When a prompt is sent to the model it undergoes a disintegration into what the _probability_ is for what the prompt is asking, word for word. This is placed into vector coordinates that graph out in mathematics what is the best guess for what you are asking, and once the model has that best guess (a range from 0 to 1), it goes into processing a best guess response, also a probability.
So there's a high probability the model can answer "what is 1+2" correctly. This also shows why Q's 2-6 were answered correctly. But once Q7 is asked, the model is only evaluating 0-1 what is a "correct" answer and will fall back to cliches, not showing intelligent thinking but rather rote programmatic responses. My point being, they can't think intelligently anyway, only on what is the best _probable_ response.
Thanks, James. I appreciate your perspective. Would it be fair to say that this is a highly technical elucidation of GIGO?
I think that's fair. Except your prompt wasn't garbage. The responses show we should leave critical thinking to the meat sacks.
Thanks. But by GIGO, I meant training bias.
Whoops. I think bad prompts often lead to bad responses. Idk enough about training the models, but GIGO is probable given these companies are training LARGE LMs. A small language model may be able to do it better