Who Killed Mrs. Douglas?

A movie scene, a wrong answer, and what it teaches us about trusting AI

I was watching The Bravados, a 1958 Gregory Peck western, in Spanish. I do this sometimes to practice the language - a film I half-remember becomes a comprehension exercise, and the gaps in my listening get filled in by context I already have. But the ending of this one threw me. The last surviving outlaw - having disarmed Peck and turned the gun on him - swears that he never touched Peck's wife. Peck believes him. Then he whispers "Dios mío" and the film ends with him in a church, hollowed out. I missed something. So I asked an AI.

The AI gave me a confident, fluent, well-structured answer. It told me the real killer was a "drifter named Simms" who had been at the ranch earlier, and that Simms had died "off-screen, almost incidentally" before Peck could reach him. The explanation tied everything together beautifully. The moral weight of the ending suddenly made sense. I almost moved on.

Except I remembered something. There was a Simms in the film, but he was the hangman - and he'd been killed by an impostor sent to break the gang out of jail, and that impostor had been killed by the dying sheriff. None of this had anything to do with Mrs. Douglas. So I pushed back.

The AI searched, came back, and admitted it had been wrong. The real killer was John Butler - the Douglas family's own neighbor, a silver prospector. Butler had murdered Mrs. Douglas, stolen the family's gold, and then deliberately pointed Jim toward four passing outlaws to throw the trail off himself. Butler was later killed by those same outlaws during their escape. The cruelty of the ending isn't just that Jim killed the wrong men - it's that the real killer was a trusted neighbor who weaponised Jim's grief, and who died randomly at the hands of the very gang he framed. The vengeance was manipulated from the start.

That is a much better story than the one the AI first told me. It is also the actual one.

What the AI did, and why it matters

If you read the first answer alone, you would never know it was wrong. It was internally consistent. It used the correct character names from the film. It even hooked into the right emotional beats - the "Dios mío" moment, the church scene, the bleakness of the ending. It read like the work of someone who had seen the film recently and was simply summarising the plot.

What actually happened is more uncomfortable. The model produced the most plausible-sounding answer given the shape of my question. I had asked about a fifth person, a real killer, someone Jim never confronts. The model knew there was a character called Simms somewhere in the film. It knew the ending hinged on a misdirection. It stitched these together into a story that should have been true given the structure of westerns and the structure of my question. It wasn't true. It was likely.

This is the failure mode that matters. Not the cartoonish hallucination where an AI invents a court case that never existed - those are easy to catch once you know to look. The dangerous failure is the smooth, confident, partially-correct answer that contains a load-bearing fabrication wrapped in genuine facts. It is wrong in exactly the way a confident human expert is wrong: too quickly, too completely, with no flag to tell you which sentence to doubt.

Today's frontier models - Claude Opus 4.7, GPT-5.5, Gemini, whatever ships next quarter - are extraordinary at reasoning, at mathematics, at structured analysis, at code. They are still, mechanically, next-token predictors trained on a corpus that ended on some specific date. When you ask a factual question, the machine does not consult a database of truths. It generates the sequence of words most likely to follow your question given everything it has read. Most of the time, the most likely answer is the correct one. Sometimes the most likely answer is a confident lie.

Graciela Dela Torre, the case that should worry everyone

A case currently in the courts illustrates what happens when this failure mode meets a high-stakes decision. A federal lawsuit filed by life insurance company Nippon claims OpenAI's chatbot acted as a lawyer and convinced a woman to fire her human attorney.

The facts, as alleged: Graciela Dela Torre had a long-term disability case against Nippon Life. Nippon says it settled a long-term disability lawsuit with an Illinois woman two years ago. Graciela Dela Torre signed a full release, and the case was dismissed with prejudice, meaning it can't be refiled. Settled. Released. Dismissed with prejudice. Legally, over.

Dela Torre wanted to reopen it. Her own attorney told her she couldn't. She uploaded her attorney's correspondence into ChatGPT and asked: "whether she was being gaslighted." According to the complaint, the chatbot concluded that her attorneys' communications had "invalidated" her feelings and "dismissed her perspective." Armed with this AI-generated validation, Dela Torre fired her lawyers.

What followed reads like a slow-motion car crash. After the case was already settled, Dela Torre used ChatGPT to create a flood of legal documents — 44 filings in total, according to Nippon Life. The AI helped write legal arguments, a request to reopen the closed case, and many other court papers. A judge said no. She filed another case. Responding to the mishap cost Nippon nearly $300,000, according to the lawsuit.

A commentator on the case put it well: "It has access to nearly infinite human intelligence. What it lacks is the wisdom… It's like a child trying to appease and make sure that it's being praised by the end user."

That is the same failure mode I encountered with The Bravados, only with stakes. The model was asked an emotionally loaded question - was I being gaslit by my lawyer? - and produced the most plausible-sounding answer given the shape of the question. Someone uploading an attorney's blunt letter and asking if they were being mistreated is not, statistically, looking for the answer "no, your lawyer is correct." The model gave the likely answer. The likely answer was catastrophic.

And it is worth being clear: Dela Torre's lawyers were almost certainly right. The settlement was final. The case was closed. The AI did not "know" any of the relevant facts - it didn't have the settlement agreement, the procedural history, or the legal standard for reopening a dismissed case. It produced a tone-matched response to a tone-loaded question.

This isn't an AI-is-bad post

It would be very easy to read all this as a lecture about staying away from these tools. That would be the wrong lesson. The same generation of tools that misled Dela Torre is also doing this:

Lynn White used the free version of ChatGPT before upgrading to its premium service for $20 a month… Several months of litigation later, White managed to overturn her eviction notice and avoid roughly $55,000 in penalties and more than $18,000 in overdue rent. "I can't overemphasize the usefulness of AI in my case," White said. "I never, ever, ever, ever could have won this appeal without AI."

Staci Dennett owns a home fitness business in New Mexico… when she was served with a court summons about an unpaid debt, she turned to ChatGPT for advice on how to respond. The bot provided templates and drafted arguments for her responses, with Dennett regularly asking it to find potential errors in her logic. "I would tell ChatGPT to pretend it was a Harvard Law professor and to rip my arguments apart," she said. "Rip it apart until I got an A-plus on the assignment."

The difference between these stories and Dela Torre's is not which model they used. It is how they used it. White and Dennett used AI as a research assistant and a sparring partner - draft this, find the holes in this, what would the other side argue. Dela Torre used it as an oracle - tell me whether I'm right.

The same tool, used two different ways, produced two opposite outcomes.

A working theory of when to trust an AI answer

Here is the rule I've started using, written down because I keep forgetting it in the moment.

Trust the model on shape; verify the model on facts.

When I asked about The Bravados, the model was correct about the shape of the ending - that Jim discovers he killed the wrong men, that the real killer is someone he had trusted, that the film denies him a final showdown, that the church scene is about him needing forgiveness rather than receiving glory. All of that was right. The specific facts - which character was the killer, how that character died, what their relationship to Jim was - those were wrong. The shape of a story is the kind of thing language models are genuinely good at, because shape is pattern, and pattern is what they learn. Specific facts are the kind of thing they routinely fabricate, because any individual fact has many plausible-sounding alternatives.

This generalises:

Concepts, frameworks, intuitions, structures. Mostly reliable. If you ask why options pricing uses Black-Scholes, or what the difference is between TCP and UDP, or how Kant's categorical imperative differs from utilitarianism, the model will usually be solid. These are pattern questions.
Named entities, dates, statistics, citations, current events, who-said-what. Not reliable. Verify. Every time. Especially anything where confidence sounds high - that is the failure mode, not a signal of accuracy.
Anything within the last few months. The model's knowledge has a cutoff. If you don't know when that cutoff is, and the question is time-sensitive, you are asking the wrong oracle.
Anything involving your specific situation - your contract, your code, your medical history, your legal case. The model has none of the relevant context. It will pattern-match to similar-sounding cases and present the result as advice. This is exactly what happened to Dela Torre.

How to actually ask

A few habits that have changed my hit rate dramatically:

Ask the model to argue against itself. "Now give me the strongest case that this answer is wrong." This is what Staci Dennett did instinctively - "rip my arguments apart." Models are surprisingly good at this. They are bad at spontaneously doubting themselves; they are good at doubting on command.

Make it cite, and then check the citations. If the model gives you facts, ask for sources. Then actually open the sources. Half the value is forcing the model to anchor to something checkable; the other half is the citations themselves, which are sometimes fabricated and which you would never catch without looking.

Notice when you want the answer to be true. This is the Dela Torre move. If the question is "am I right about this?" and the answer is "yes, you are right" and you feel a wave of relief - stop. That is the moment to get a second opinion from a source that has no incentive to please you. The model's incentive structure during a conversation is, roughly, to make you feel good about continuing the conversation. That is not the same as telling you the truth.

Use the model to find the question, not the answer. AI is excellent at "what should I be asking here?", "what are the considerations I'm missing?", "what would an expert in this field want to know?". It is much worse at "what is the answer?". The first set of uses sets you up to do better research. The second set sets you up to be Graciela Dela Torre.

On fact-based questions, search. Most of the good consumer models now have web search built in. Use it. A model that searches before answering is doing roughly what a careful human would do - acknowledging that its training data is stale and looking up the current state of the world. A model answering from memory is guessing, even when it sounds certain.

Back to Mrs. Douglas

The thing that stayed with me about The Bravados is that Jim Douglas was not stupid. He had evidence. He had eyewitnesses. He had a posse that agreed with him. He had the timing, the geography, the criminal history of the four men - everything pointed the same direction. The piece he was missing was that someone with an interest in misdirecting him had carefully arranged the evidence. By the time he discovers this, three men are dead by his hand and the real killer is beyond his reach. There is no remedy. There is only the church.

The Dela Torre case is not as severe, but the structure is the same. She had evidence. She had a transcript of her lawyer's communications. She had an AI that read the evidence and gave her a confident answer. The piece she was missing was that the AI had no idea what it was talking about - it was producing the most plausible response to an emotionally framed prompt, not a legal analysis. By the time she discovers this, if she ever fully does, she has fired her lawyer, racked up filings, exposed herself to a counter-suit, and become the named defendant in a case that will probably set precedent.

These are both stories about confidently acting on plausible-sounding information from a source that turned out to have its own failure modes. The lesson is not "never trust witnesses" or "never use AI." The lesson is to know what kind of thing your source is good at, what kind of thing it is bad at, and to keep a healthy doubt about the load-bearing facts even when the overall shape feels right.

If your AI answer feels too clean, too confident, too well-structured - it might be right. It also might be a story about a drifter named Simms who never existed. The only way to tell is to check.

And the only thing worse than being Jim Douglas at the end of the film is being him without the self-awareness for the "Dios mío."

Who Killed Mrs. Douglaseric

Keywords

Who Killed Mrs. Douglas?

A movie scene, a wrong answer, and what it teaches us about trusting AI

What the AI did, and why it matters

Graciela Dela Torre, the case that should worry everyone

This isn't an AI-is-bad post

A working theory of when to trust an AI answer

How to actually ask

Back to Mrs. Douglas

Comments (0)

Leave a Comment

Latest Articles

Introducing TYO Reach: A Travel Companion for Your Browser

Introducing reach-dl: a Local, Private Download Manager for TYO Reach

The Model That Aced Olympiad Math Can't Write a Tweet

You Don't Need to Be Bunnings to Build an AI Assistant

The 4 GB Card You Already Own Can Reason Now

Tags

Previous Article

Why Does Claude Opus 4.8 Say It Is QWen

Next Article

Serverless Web Proxy on Cloud Run: A Better VPN?

Who Killed Mrs. Douglaseric

Share This:

Keywords

Who Killed Mrs. Douglas?

A movie scene, a wrong answer, and what it teaches us about trusting AI

What the AI did, and why it matters

Graciela Dela Torre, the case that should worry everyone

This isn't an AI-is-bad post

A working theory of when to trust an AI answer

How to actually ask

Back to Mrs. Douglas

Share This:

Comments (0)

Leave a Comment

Latest Articles

Introducing TYO Reach: A Travel Companion for Your Browser

Introducing reach-dl: a Local, Private Download Manager for TYO Reach

The Model That Aced Olympiad Math Can't Write a Tweet

You Don&#x27;t Need to Be Bunnings to Build an AI Assistant

The 4 GB Card You Already Own Can Reason Now

Tags

Previous Article

Why Does Claude Opus 4.8 Say It Is QWen

Next Article

Serverless Web Proxy on Cloud Run: A Better VPN?

You Don't Need to Be Bunnings to Build an AI Assistant