Lies, Damned Lies and Data
Important lessons from the COVID Tracking Project and other reasons why "data" alone isn't the answer to better government.
Back when billionaire Mike Bloomberg was running for president, he liked to remind people of a saying that he brought with him from business to City Hall and his foundation: “In God we trust. Everyone else: bring data.”
The notion that data-driven government is better government, or data-driven campaigns are better campaigns, has become deeply rooted in recent years, at least on the center-left side of the political spectrum. Bloomberg himself, through his philanthropy, has spent tens of millions pushing and prodding cities to be more “data-driven.” Unfortunately, relying on “data” to make decisions is a lot like the person who lost their keys in a dark alley searching under a lamppost because the light is brightest there.
(Yes, I used a big data tool to make an illustration about “data-driven” everything.)
I was reminded of this as I read Robinson Meyer and Alexis Madrigal’s new essay in The Atlantic called “Why the Pandemic Experts Failed.” A year ago, along with Erin Kissane and Jeff Hammerbacher, they co-founded the COVID Tracking Project to try to monitor the spread of the virus and the government’s response. And what they learned from that effort is critical, and ought to be a lesson in humility for everyone who uses “data” in the political arena. Here are their key take-aways:
“For months, the American government had no idea how many people were sick with COVID-19, how many were lying in hospitals, or how many had died.”
Test positivity data aren’t standardized by state, and because negative test results lag behind positive case numbers, the rates tend to look higher than they actually are.
At least five states have “disturbingly incomplete” testing data.
Deaths aren’t reported when they happen; “about a quarter of deaths are reported less than six days after they have occurred; another 25 percent are reported more than 45 days after.”
“Tens of millions” of the rapid antigen tests are going unreported.
The only really solid data is hospitalization numbers reported to the Department of Health and Human Services.
Despite these problems, the Centers for Disease Control tells state leaders to use its own test-positivity-rate data, which Robinson and Madrigal note is based on inaccurate state data, to make decisions about reopening schools. They conclude: “Data are really nothing special. Data are just a bunch of qualitative conclusions arranged in a countable way. Data-driven thinking isn’t necessarily more accurate than other forms of reasoning, and if you do not understand how data are made, their seams and scars, they might even be more likely to mislead you.”
Last year, as I worked with my colleagues Diana Nucera, Berhan Taye, Sasha Costanza-Chock and Matt Stempeck on our field scan of emerging technologies in the public interest, “Pathways Through the Portal,” we encountered many serious issues with relying on “big data” to power shiny-new efforts to use artificial intelligence tools like machine learning to public issues. Data is messy, it’s expensive to collect and maintain, and it’s often missing. We noted that “Data sets that are crucial to concerns of racial and gender equity, accessibility, and social justice are often difficult, expensive, or impossible to obtain, such as data about police violence, gun deaths, worker fatalities, or comparable educational outcomes across locations.”
That’s just part of the problem.
We use flawed data to make political arguments. How often have you encountered someone citing exit poll data to make a claim about how X group voted without realizing that exit polls are broken? To give just one simple example, the Pew 2016 exit poll estimated that the electorate was 55% women and 45% men; the YouGov Cooperative Election Study had it at 50-50. And regular polling data is also filled with problems, from rising refusal rates among right-leaning voters to mistaken assumptions about the underlying electorate. Using past election data to make judgments about the current voter pool is like assuming you can step in the same river twice.
We use flawed data to make critical decisions about end-of-life care, such as what nursing home to choose. That’s the upshot of a valuable new expose by Jessica Silver-Greenberg and Robert Gebeloff in The New York Times. (Before I explain further, let me just add that this article, which involved a massive amount of information gathering, including the analysis of millions of payroll records to determine how much hands-on care nursing-home residents get, the examination of 373,000 reports by state inspectors and more than 10,000 financial statements submitted by nursing homes to the government, is a great example of why The New York Times remains one of our most vital civic treasures. The next time you think about cancelling your subscription because of some dumb editorial decision or headline, think again.)
Twelve years after a new star-rating system called Care Compare was instituted to simplify comparisons between facilities, the Times’ reporters found that nursing homes have totally gamed the ratings. The facilities inflate their staffing levels, they understate how many patients are on dangerous medications, and much of the information they submit to the government is wrong. Silver-Greenberg and Gebeloff write, “In one sign of the problems with the self-reported data, nursing homes that earn five stars for their quality of care are nearly as likely to flunk in-person inspections as to ace them. But the government rarely audits the nursing homes’ data.”
Even worse, the new ratings system has only helped the private equity industry, which started buying up nursing home systems around the same time, maximize profits. “Five-star facilities earned about $2,000 in profits per bed in 2019,” the Times analysis of their financial statements found. “Those with three or four stars earned about $1,000 per bed. Poorly rated homes were typically not profitable.” Patient outcomes in five-star facilities is a different matter: “people at five-star facilities were roughly as likely to die of the disease as those at one-star homes,” the Times found. This tracks with a recent working paper from the National Bureau of Economic Research, which studied the effects of private equity (PE) ownership on patient welfare at nursing homes. It found “that PE ownership increases the short-term mortality of Medicare patients by 10%, implying 20,150 lives lost due to PE ownership over our twelve-year sample period. This is accompanied by declines in other measures of patient well-being, such as lower mobility, while taxpayer spending per patient episode increases by 11%. We observe operational changes that help to explain these effects, including declines in nursing staff and compliance with standards.”
There’s also often a Heisenberg-like effect surrounding efforts to get accurate data. A few years ago, a friend who was involved in expanding access to computer education in New York City’s public schools told me that the Department of Education had no idea how many computers it had in its nearly 2,000 schools. (Indeed, if you dig into individual school reports on the DoE dashboard you won’t find a clue.) Not only is that data missing, efforts to get it were stymied by a bureaucratic paradox. If you surveyed school principals and asked them for a current tally, you wouldn’t get one. That’s because some principals who had managed to stock their school with a sufficient supply of computers might fear either being investigated or being undersupplied in future procurement rounds. Meanwhile other principals whose schools truly lacked computers might be afraid of revealing that fact—even if it might lead to more computers—because it could hurt their school’s image.
The truth is data is inherently political and subject to manipulation. How we interpret it, when we choose to acknowledge its flaws and when we act omnipotent--these are political choices embedded in power imbalances. When Bloomberg was riding high as New York City’s mayor, he repeatedly insisted that his “stop and frisk” policy, which targeted millions of young, primarily Black and Brown youth, for harassment by the NYPD, had made the city demonstrably safer. He chose to emphasize the number of guns and other weapons confiscated in the process, and even urged that more young Black men be stopped. When his successor ended the policy, which had been ruled unconstitutional, the violent crime rate in New York continued to drop—suggesting Bloomberg had been wrong all along. But only when he got into the 2020 presidential campaign and needed to appeal to Black and Brown Democratic primary voters outside New York, did he renounce the policy.
We live in an age of radical uncertainty, as John Kay and Mervyn King titled their 2020 book on decision-making beyond the numbers. That’s the hard truth, and our difficulties with data only compound the challenge of dealing with disinformation. Because every time a government figure tries to act with authority, instead of admitting what they don’t know, they fuel public skepticism. It’s even worse when government figures lie or cover up contradictory information in order to sell a policy (see, the 2002 invasion of Iraq) or themselves (see Governor Andrew Cuomo’s self-congratulatory book on how he conquered COVID). I have to admit, I don’t feel like I’ve really landed in a stable spot in this debate. While the breakdown of a shared political reality among Americans is a severe problem, with one-third of Congress not really accepting the results of the 2020 election, I’m not sure I want “government-certified” data to be the standard, given all the problems we have with everything from COVID information and nursing home ratings to claims about who has weapons of mass destruction.
Either way, it’s not enough to “bring data.” The data has to be shared in transparent ways, so it can be challenged. And then, hardest of all, attention to the problems with existing data has to be sustained. We need to have a better collective memory of what’s solid and what’s shaky. Otherwise, you’ll be sending your kids to a school that’s been reopened too soon, while putting your parents in a dangerous nursing home and voting for leaders who don’t deserve your support.
Odds and Ends
-Moe Tcacik on the quiet but growing revolt against Big Tech’s 30% commission charges is a sign of some hope. She points to a revolt against delivery app fees, which have been capped in some 73 municipalities in the last year, as a sign that Silicon Valley’s favorite way of making money—skimming 30% off the top of any sale merely for the privilege of being on some platform—is under assault.
-Facebook is back to providing news links to its users in Australia after it reached a multi-year agreement with News Corp similar to an earlier one that Google struck to pay the media giant for access to its content. Looks like the old news baron (Murdoch) got the new one (Zuckerberg) to blink.
-My hat is off to Daniel Schuman and Marci Harris, two long-time civic tech leaders who led the fight last year to get Congress to shift to remote hearings as COVID hit. The two write in The Washington Monthly that last spring, “We were astonished as at least a decade’s worth of overdue modernization were implemented in 48 hours,” noting that “no objections were raised when, on April 6, Speaker Nancy Pelosi announced a new system for the digital submission of bills, co-sponsorships, and lawmakers’ statements.” They add, “This unprecedented agility in a traditionally inflexible institution that has changed little in over 230 years should not disappear when the pandemic ends. Rather, it should mark the start of a new way of doing business.” Amen to that.
-“Investigating how media and technology have evolved to create the American in which we find ourselves, without our permission and in ways that we understand poorly, is essential to finding a new, better path forward. Healthy, vibrant, pluralistic democracy is not incompatible with modern media, but our failure to understand modern media weakens the foundations of America democracy. It makes us vulnerable to commercial and political actors whose incentives do not necessarily align with our public good. By allowing faithless actors to define the core principles of this landscape, we have abdicated our own authority when it comes to laying the foundation for a new, modern public sphere that can deliver on the promises of greater inter-connectivity rather than separate and exploit us for profit. But we can take it back.” That’s Michael Slaby, chief strategist for Harmony Labs and the CTO for the 2008 Obama presidential campaign, from his valuable new book, For ALL the People: Redeeming the Broken Promises of Modern Media and Reclaiming Our Civic Life.
-Today’s headline is a derivation of the phrase “lies, damned lies, and statistics,” which itself has an uncertain pedigree. One early version is “Whereupon counsel on the other side was heard to explain to his client that there were three sorts of liars, the common or garden liar ... the damnable liar who is fortunately rather a rara avis in decent society, and lastly the expert” which has its own charm.
This is my favorite edition of your newsletter thus far. Great writing, Micah.