The “Directionally Correct” Fallacy, Part III: Its Origins in the 100% Data Sample Fallacy
The “directionally correct” fallacy originates in another fallacy that seems to plague the healthcare industry: the 100% data sample fallacy.
For the avoidance of doubt, a 100% data sample is a mythical creature, like the Loch Ness monster or a Super Bowl championship for the Tennessee Titans. Virtually every executive in any industry outside of healthcare understands that a 100% data sample does not exist. In the healthcare industry, however, the lack of something that does not exist is frequently – and dismissively – cited as a reason not to use data for analysis.
The 100% data fallacy is self-evident upon a moment’s reflection. Given the amount of healthcare that is “free” (charity care, drug samples, etc.) and paid in cash (dermatology, plastic surgery), it is obvious that a 100% data sample of healthcare activity is unattainable. The 100% data sample fallacy is also revealed by the simple fact that no one knows exactly how many suppliers of healthcare services exist.
Executives in non-healthcare industries source as much data as possible and consistently explore new data sources, making the best decisions possible from the most relevant data available. Executives in non-healthcare industries understand the adage that “everything is information.” In contrast, healthcare executives are prone to decry the lack of 100% data and then either default to anecdotes and random observations in developing strategies or, alternatively, do nothing at all.
What is ludicrous about executives who are unwilling to use any data short of a 100% data sample is this: if a 100% data sample existed, none of the people clamoring for it can do anything with it. Even three years of a 100% data sample of healthcare activity would be several pebibytes (1,000 tebibytes) in size, and the typical healthcare organization would not know how to store it, much less how to analyze it.
The Don Quixote-like quest for a 100% data sample also reveals a fundamental misunderstanding in developing evidence-based strategies. In developing strategy, benchmarking, which the industry has been trained to do for decades, is completely inferior to predictive analytics based on probability, which does not require anything close to a 100% data sample.
Generally, more data is better than less data. However, for probability, the longitude of a data set is more impactful than the amount of the data. For example, 10 years of data that represents 10% of actual data is more relevant to probability than 1 year of a 100% data sample.
Predictive analytics are never 100% accurate, but they are infinitely more accurate than decisions made by people who make “directionally correct” decisions in the belief that only a 100% data sample is sufficient.