Data with Duke • Data With Duke

Watch the video on YouTube

Episode 6 – Health data isn’t THAT important…right??

Transcript

Intro

Hello and welcome back to Data with Duke — the show where we take all things data, boil them gently for 20 minutes, season with sarcasm, and serve piping hot insights that even your nan could understand.

And to those who noticed… yep. I’ve been away for a few weeks. You’re absolutely right. I’ve been ill — proper ill. The sort of ill where even your dog looks at you like, “Mate… pull yourself together.”

I spent about a fortnight (maybe three weeks) in the biological equivalent of a Windows 95 error screen, and it got me thinking: let’s do an episode about health data — the good, the bad, and the embarrassingly optimistic.

I actually prepared this episode weeks ago… and then Christmas happened. Which, as we know, requires planning, logistics, and alcohol. Now it’s early January, I’m still not 100%, but I’m upright and vaguely coherent — so let’s do this.

Quick note: I’ll probably do another episode next week (or the week after) about New Year’s resolutions, so send yours in. Drop a comment, contact me through the website… whatever works.

Right. Back to the one I actually prepared.

Today we’re talking about how data is used in health in the UK and around the world. We’ll cover heroic success stories, catastrophic failures, and Google’s brave-but-dodgy attempt at becoming Dr House.

That’s right: Google Flu Trends is back, baby.

So grab your Lemsip, buckle up, and let’s talk about the times data has saved lives… and the times it quietly ruins somebody’s afternoon.

Health data: what it is and why it matters

Let’s start with the basics — or, in other words, why doctors love spreadsheets as much as… systems. Symptoms. I meant symptoms. But honestly, both.

Health data is everywhere. If data had a Tinder profile, the NHS alone would be swiping right so fast it would get carpal tunnel.

We’re talking:

Electronic health records
Hospital Episode Statistics
Public health surveillance
Genomic sequencing
Wearables… including the Fitbit you bought in 2019 and haven’t charged since

In the UK, the NHS is one of the world’s largest, slightly creaky, semi-magical data systems. Millions of interactions create data: appointments, prescriptions, X-rays, A&E visits, and the occasional “I accidentally swallowed a keyring” story.

Globally, health data helps governments identify outbreaks, plan vaccination programmes, predict winter pressures, and work out whether Karen’s symptoms mean she’s allergic to gluten… or whether she just needs a glass of water and to stop Googling things at 3am.

But the big story — the one that really defines what happens when data gets above its station — is Google Flu Trends.

Google Flu Trends: Big Data meets Big Oopsie

I’ll link some good blogs and research papers in the show notes on datawithduke.com, because if you’re into big data case studies, this is one of the best you’ll ever find.

We need to go back to 2008. Barack Obama has just been elected, everyone’s wearing oversized belts, and Google decides it wants to predict flu outbreaks using search data.

This is before ChatGPT, before Gemini, and even before Google autocomplete became the mind-reading menace we all know today.

The idea was simple: if loads of people in a region start searching things like:

“flu symptoms”
“why do I feel like death”
“how long can a human survive on Lemsip alone”

…then maybe there’s a flu outbreak happening right now. And crucially, maybe we can detect it faster than traditional systems.

Back then, outbreak detection relied heavily on the slow stuff: GPs, pharmacies, and public health reporting. By the time it got escalated, half the country had already sneezed their way through it.

So Google built Google Flu Trends (GFT) — a model that looked at search patterns and estimated flu prevalence. It was supposed to be faster than traditional surveillance, and for a while…

…it actually worked.

Between 2008 and 2011, GFT tracked flu trends surprisingly well — often quicker than the CDC. Everyone clapped. Headlines screamed “Big Data will replace epidemiologists!”

Somewhere, a thousand statistics professors sighed into their coffee.

GFT became the shiny new toy of digital epidemiology.

And then, like most shiny toys, it got dropped down the stairs.

So what went wrong?

Around 2012–2013, GFT started massively overestimating flu cases — in some cases, double (or more) than reality.

Why?

Because search behaviour does not equal illness.

People Google weird things:

out of concern
because someone coughed on the bus
because WebMD has convinced them they’ve got 15 minutes to live

This is a perfect “correlation vs causation” case study. People searching “flu” doesn’t mean people have flu.

But the funniest part is this:

Google broke Google Flu Trends by updating Google.

In 2012, Google expanded autocomplete and personalised suggestions. Suddenly, typing “why is my nose…” could autocomplete into “running flu symptoms death imminent click here”.

And if you’ve never played the Google Autocomplete game, I recommend it. One rainy Sunday afternoon, try typing:

“why is my girlfriend…”
“why is my boyfriend…”

It’s either comedy gold or a reason to never go online again.

But here’s the point: autocomplete caused searches for flu-related terms to spike artificially. GFT didn’t know Google had changed; it just assumed flu had gone nuclear.

And this is where big data often goes wrong: correlations can appear by accident.

One famous example was that search terms like “high school basketball” ended up highly correlated with flu. Not because basketball gives you flu — but because basketball season overlaps with winter.

GFT didn’t understand context. It just saw patterns and made confident guesses.

Another problem: the model wasn’t being updated often enough. Google invested a lot, but not enough to keep it properly maintained.

Big data is like milk: great when it’s fresh, tragic when it’s not.

In the 2013 flu season, GFT overestimated peak flu cases by over 140%. It went from poster child of “big data will save the world” to poster child of data hubris.

And then, in 2015, Google Flu Trends was quietly discontinued. No dramatic apology blog post. It just… stopped.

Like a sad goldfish.

Why does Google Flu Trends still matter?

Because it taught us some brutal lessons:

Big data can’t replace traditional public health systems
Search behaviour is influenced by platform design, media, and trends
Models break when the world — or the input system — changes
If your model relies on autocomplete, maybe don’t stake epidemiology on it

Still, as failures go, it’s educational… and strangely lovable. It’s one of those case studies that data scientists will be pointing at forever.

UK health data wins: COVID-19 (yes, but the data bit)

Right. I want to talk about something closer to home, because not everything is on fire.

And yes, I’m going to mention COVID — but don’t turn me off yet. We’re focusing on the data, not the politics, and definitely not the “sing Happy Birthday while washing your hands” trauma.

During the COVID pandemic, the UK became a global leader in a few major data areas.

Genomic sequencing

The UK sequenced a huge number of SARS-CoV-2 samples early on, helping identify variants earlier than many nations. That meant faster insight into what was spreading, how it was changing, and what symptoms were shifting.

The COVID dashboard

The colour-coded national obsession.

It became one of the most comprehensive public pandemic data portals in the world: daily updates, regional breakdowns, cases, hospitalisations, vaccinations, deaths. Not perfect — but building something that fast, that visible, updated daily… that’s not a small feat.

The RECOVERY trial

Run by Oxford, this was one of the fastest and largest clinical trials in history, using adaptive trial design.

It identified treatments that genuinely improved outcomes — and ruled out others that were being loudly promoted by people who were, let’s say, not especially qualified.

Whatever your overall view of the response, from a data perspective the UK contributed massively — and that insight saved lives worldwide.

Wearables and remote monitoring

Now let’s talk about wearables.

How many of you have got an Apple Watch, Fitbit, Samsung Watch, Google Watch… something along those lines?

These devices produce real-time data:

Heart rhythm notifications
Glucose monitoring for diabetes
Blood oxygen sensors for respiratory issues
Sleep tracking (which I personally use)

Someone close to me first discovered they were experiencing arrhythmia because their Apple Watch told them. That’s not a gimmick — that’s early warning.

Do wearables know your heart rate better than you do? Absolutely. Do they also occasionally panic because you walked up the stairs too fast? Also yes.

They won’t replace doctors, but they can support early detection — and we’ll see more of this in the coming years.

Genomics and personalised medicine

Think of things like:

ancestry DNA tests
23andMe
population sequencing projects

Sequencing can help identify hereditary risk and diagnose rare diseases earlier — sometimes years earlier than symptoms appear.

Outside the UK, Iceland’s deCODE genetics is a well-known example of long-term population genomics research. Iceland also tends to show strong health outcomes — not purely because of data, obviously, but it certainly helps.

Wastewater surveillance: yes, it’s gross, but it’s genius

Stick with me.

Some countries analyse wastewater to track levels of diseases like COVID, flu, RSV, norovirus, and even polio.

That’s right: your toilet flush is now a public health instrument.

No, you cannot opt out.

When health data goes wrong

Of course, it’s not all success stories.

Around 2013, there was a plan to link GP records with hospital data to support research and reduce fragmentation.

Sounds great… except the public felt blindsided. People panicked about privacy, and the project was eventually pulled.

Moral of the story: if you surprise the British public with anything — even improved healthcare — we will revolt.

IBM Watson (the cancer diagnosis AI)

Watson was designed to support oncology decisions, but in testing it sometimes produced unsafe or incorrect recommendations. Eventually, doctors quietly asked it to leave the room.

Algorithmic bias

This one matters.

Healthcare algorithms have been found to underestimate disease risk in minority groups, often because training data doesn’t represent the population properly.

Bias isn’t a “whoops, try again” situation in healthcare. It can lead to underdiagnosis, wrong prioritisation, and systemic inequality — with deadly consequences.

Ethics in health data isn’t optional. It’s the whole game.

Why health data is so valuable… and so messy

Health data is gold dust because it’s:

personal
predictive
useful for planning and cost reduction
powerful for research and treatment discovery

But it’s also messy because it’s:

fragmented across systems
inconsistent
ethically sensitive
full of missing values
entered by tired humans at 4am after a double shift

Health data can be the dream dataset.

It is often the nightmare dataset.

Lessons we’ve learned

From Google Flu Trends to COVID dashboards:

Big data is powerful, but messy
Input quality matters (search behaviour ≠ clinical data)
Transparency matters — the public want to know how data is used
Models should support public health, not replace it
Keep models up to date, or you’re predicting the future using last year’s horoscope
Ethics is not optional

Outro

So no, data isn’t medicine… but it can and has saved lives.

Health is one area where data genuinely makes the world better: detecting outbreaks, discovering treatments, understanding diseases we didn’t even have names for a few years ago.

But we’ve also learned that technology companies aren’t public health agencies — and counting search queries doesn’t make you Dr Gregory House.

As for me, I’m feeling pretty much better. Mostly what I needed was rest, fluids, and avoiding WebMD at all costs.

Thanks for tuning in to Data with Duke. Stay safe, stay curious — and if you feel ill, don’t Google it. Call somebody with actual medical training.

This is Duke, signing off.