Using AI: Double Check!

A while back I proposed six rules to use AI safely. They were:

Use AI on a computer, not your cellphone
Double check AI work
Don’t Use AI for Rough Drafts
Don’t Use AI for Interpersonal Communication
Learn how to prompt well
Don’t use AI for something you’d like to be good at

Over time I hope to refine this list and maybe expand it (to ten, obviously) or contract it (to five, obviously), but while I’m thinking about it I’d like to dig into each rule one-by-one, and I want to start with DOUBLE CHECK AI WORK. So why is this important?

AI doesn’t understand anything

The way we anthropomorphize AI is pretty destructive, because it makes people believe that AI is, well, a person who is capable of all the normal stuff people are capable of.

AI is not (a person, nor capable of regular person stuff). AI is a very complicated algorithm that produces the most likely response given a text input (prompt).

Even if you are practicing rule 5 (prompt well) and tell AI “Behave as if you are an expert in X” it is NOT an expert in X, because it’s not an expert in anything, because it’s not capable of gaining expertise.

Here’s an example. I have a medical research agent I created which has some information about me, and is instructed to behave as a medical researcher, and to ONLY refer to facts that have been substantiated in multiple, high quality, peer reviewed academic papers (and the search lens it’s connected to is only academic papers anyway, thanks, Kagi!).

A doctor suggested I look into intermittent fasting (IF), so I asked my agent to create a report on how I could best implement IF, as well as to provide general safety information including what I should look out for given some medical concerns I have.

It came back with a strongly worded warning for me:

“… a 2024 observational study of 20,000+ adults found that 8-hour eating windows were associated with a 91% higher risk of cardiovascular death, raising serious safety concerns about extreme time restriction.”

That’s terrifying! But if you know anything about IF you’re probably going “That can’t be right …”

And you would be correct.

I went to the cited paper, which was actually a press release in a journal about a pre-print paper — one that hadn’t actually been published, and showed some pretty glaring methodological flaws, even to my untrained eye. So I asked my agent to dig into it.

A controversial 2024 American Heart Association study claiming that 8-hour time-restricted eating increases cardiovascular death risk by 91% has been thoroughly debunked by the scientific community, with no peer-reviewed replications confirming the findings and multiple studies directly contradicting the claim. The study, presented as an abstract at the AHA’s March 2024 conference by Victor Zhong from Shanghai Jiao Tong University, analyzed 20,078 adults from NHANES data (2003-2018) but remains unpublished in any peer-reviewed journal 10 months later. The research sparked immediate backlash from 34 time-restricted eating researchers who formally protested to the AHA president about releasing “unpublished and unvetted data” to alarm the public, while even the AHA’s own past president Donald Lloyd-Jones called it “not ready for prime-time consumption.”

Yeah, the study was bunk. And if I spoke to an actual expert they would’ve known that from the start, and they wouldn’t have started their rundown by citing a risk that didn’t exist from a terrible, terrible paper that was never actually published.

But, again, AI doesn’t understand and isn’t an expert. It did some searches, that paper came up, and it regurgitated it without realizing that it didn’t past the smell test — because it can’t smell.

AI frequently makes this kind of mistake — highlighting relevant sounding information, that isn’t actually relevant (or just making stuff up that sounds good). And we need to remember that it doesn’t actually understand what it’s saying — we need to perform the smell test ourselves, in other words. Which leads to our second main point.

To Double Check AI, you Must Be Capable of Double Checking

That study jumped out at me as not jiving, so I dug into it, and looked at the paper, and asked followed up questions. But what if I didn’t know how to read an academic paper? What if I didn’t have any experience with doctors who made mistakes and assumed that medical advice was always 100% right?

Well, then I wouldn’t be trying IF, that’s for sure.

If you aren’t capable of double checking AI in a given subject, you shouldn’t use it for information about that subject.

I know this is going to fly in the face of vibe coders everywhere, but if you don’t understand coding, you shouldn’t use it to code because you can’t be sure what it’s sticking in your program. Now, the stakes are lower for someone who is vibe coding some kind of flappy bird clone, hoping for that viral payday, but if you’re using it for something that people might count on, or it could store or connect to sensitive information … you could cause serious problems.

If you don’t know how to read medical papers, then I wouldn’t ask it for medical advice, as demonstrated above.

If you don’t know how to cook, don’t ask it for recipes. It’ll end up recommending you put elmer’s glue on pizza.

You get the idea.

I use LLMs on a pretty regular basis, and almost every long report I’ve asked for has had some kind of error like the one mentioned above. And the worst part is, they can be very convincing:

While asking about headache medicines that are okay for use with hypertension it warned me that I shouldn’t ever use Tylenol (the one most recommended by my doctor). This recommendation wound up coming from a study where they gave people 4000 MG of Tylenol a day for two weeks straight (the recommended maximum in the US is 3000 MG, and you shouldn’t be using it daily)
While doing research on dating apps it confidently informed me that “dating apps rate below cable internet providers in terms of user satisfaction” — that was catchy, but not true. Nothing is as despised, IN THE WORLD, as your local ISP.
When doing cost benefit analysis about one laptop vendor over another it made wild assumptions about how often laptops would get serviced, and how much that service would cost, indicating that we would save tens of thousands of dollars a month by switching providers. When running the numbers correctly we found we would’ve broken even at best.

Only using AI for things you can double check will definitely limit its usefulness, but it will also limit the risk.

Remember, AI lies about ten percent of the time. If someone recommended a doctor but said “Oh yeah, one thing you should know, they will just … completely make at least one thing up every visit. Sometimes it’s harmless, but sometimes they’ll just prescribe a new medication that you don’t need.” would you go to that doctor?

If you had a research assistant who confidently handed in ten page reports where one page was always a complete fabrication (but they don’t tell you which), would you keep them around?

Obviously not. And why? Because it would be dangerous. You could be making important decisions with terrible information.

The trade off is worth it. Limit the risk. Only use AI when you can double check the work.

Josh Boyles

Using AI: Double Check!

AI doesn’t understand anything

To Double Check AI, you Must Be Capable of Double Checking

Leave a comment Cancel reply

Using AI: Double Check!

AI doesn’t understand anything

To Double Check AI, you Must Be Capable of Double Checking

Share this:

Leave a comment Cancel reply