Tech Blog

What is a Visit? Why We Don't Use Raw Claims Data

August 26, 2025

"Not a game; not the game that I go out there and die for, we talking about practice?!"

- Allen Iverson

At Trilliant Health, we talk with a lot of vendors, clients, prospects and other people in the healthcare industry about claims data. This is because, for at least a decade, raw claims have underpinned most market and strategic analyses. Frankly, we think everyone is talking about the wrong thing.

The way most people (patients, providers, analysts) think about and consume healthcare is at the "event" level; e.g., a joint procedure, an annual checkup or a colonoscopy. Claims do not provide this view. Claims are created to receive payment for services rendered; they are receipts of what goods and services were sold, not what was done with those goods and services (covered in a previous article, if you want to read more).

Imagine trying to figure out exactly what a family ate for dinner in a week by analyzing their grocery store receipt. You see they’ve purchased chicken, beef, tortillas and bread. Chicken fajitas one night then burgers the next? Chicken sandwiches and ground beef tacos? Chicken caesar wraps and sloppy joes? You couldn’t say for certain what meals they had, and you certainly can’t aggregate that data across families to make any predictions about what meals are being made across the country.

We call this "event-level" view a Visit: a patient saw a provider at a specific place, for a period of time, for one or more procedures. That is is what you, the person using the data, actually cares about. A Visit is the most effective way to represent healthcare activity, because it mirrors how people naturally think about healthcare and how we consume it. It provides an intuitive building block to create analyses, rather than sifting through a pile of claim "receipts" in order to guess at the meal.

But why? Why does it matter that the structure of the data aligns with how we think about it? For surface level analyses, it doesn’t matter that much. If all you want is a count of how many times Provider X performed procedure Z in year YYYY, it can be pretty straightforward to get that answer out of claims data. Once the question gets even slightly complex, however, raw claims really show their flaws:

What if you want to know how many times Provider X performed procedure Z, but at a specific location? You can’t rely on professional claims to accurately reflect the site of service; they largely provide billing addresses.
Are you leveraging Type 2 NPIs on a claim to cross reference NPPES data? Not only do type 2 NPIs usually reflect a corporate or billing structure and not physical locations, but CMS has been pointing out that NPPES isn’t all that accurate as far back as 2013.
What if you want to look at inpatient procedures specifically? Good luck; there is no reliable indicator on a claim as to whether an inpatient stay occurred; you can’t simply look for the presence of a DRG or ICD10-PCS. Even if a place of service code is present on the claim, using it to confirm if the claim took place on-campus at a hospital is notoriously unreliable.

To further elucidate the matter, let’s look at a colonoscopy, of which there are nearly 2 million performed annually in the U.S. How would a colonoscopy manifest in claims data?

You will almost certainly get a claim for the professional services of the provider performing the procedure.
You might have a separate institutional claim for the hospital or outpatient surgery center where the procedure occurred, but that might be bundled in the professional claim.
You might have a separate claim for the anesthesia administered during the procedure. It might come from a third party anesthesia group, or the hospital/surgery center where the procedure occurred. It also might be bundled in the institutional claim or the professional claim above. And the nature of the anesthesia provided can vary greatly depending on the patient’s risk factors.
You might, depending on how the procedure goes, have pathology or lab services related to the colonoscopy. Those could be performed at the site of the procedure, or could be done at a separate third party lab. That could show up in the professional claim for the procedure, a separate professional claim from a third party or be issued as part of the institutional claim. It also could show a date of service or site of service that is completely separate from the date or site of service of the procedure itself.

In addition to paying attention to all of the above, you can’t forget about the general noise you will always encounter in any claims analysis:

What if the patient had another procedure on the same day, like an allergy shot or blood draw for a reason completely unrelated to the colonoscopy? How would you know if it’s related or not? And, if not, how would you filter it out?
What if some of the claims for the procedure are missing from your dataset? How do you get the complete picture if parts of the dataset are missing?
What if two of the claims for the procedure have the same procedure code? How do you ensure you’re not double-counting?
What do you do when you see conflicting information in the claims, like a different date or site of service?

Even with such a routine and common procedure, there are dozens of different ways the claims might manifest for any particular patient. You either spend a lot of time trying to account for every single edge case, or you simply accept the fact that there is a degree of error in your answer.

Visits makes these analyses easier because we look at each claim within the context of its surrounding claims. Visits takes all these edge cases into account, at scale, so that the user can stop thinking about the inherent complexity of claims and instead focus on the analysis at hand. With Visits, you can look at events, which is how humans think about healthcare activity, as opposed to struggling with the difficulty of trying to group related claims that do not have an explicit connection to one another.

If Visits are the end product, but there is no explicit connection between claims, you might wonder how Trilliant Health creates them? We leverage machine learning combined with a lot of domain expertise.

We stated above that looking at claims on a small scale, a single patient at a time, is relatively straightforward to reason over. The real difficulty is doing that at scale (billions of claims per year) across the breadth of the healthcare domain (many tens of thousands of procedure and diagnosis codes) and dealing with the long tail of edge cases. Obviously, having a team of humans do all that work isn't feasible, so we turn to machine learning (ML). The essence of ML is giving a machine a set of inputs, a corresponding set of outputs and asking the machine to figure out “why.” In other words, if we know x and we know y, we let the machine figure out f(x) = y. In our case, we have manually connected thousands of claims (inputs) into visits (outputs), then leverage ML to build models and functions in order to scale that approach to many billions of claims and visits.

Screencapture of Visit Annotation Tool that the Curation and Annotation team uses to train ML models.

We believe that the the transformation of claims into Visits is the most valuable thing we do at Trilliant Health. It is certainly the most difficult. We are translating a raw claims dataset that is hard to use into one that more closely models our intuitive understanding of healthcare activity. How do we know if we are successful? Our measuring stick is whether you can effectively perform the analysis you want, and whether you can trust that it is accurate.

With Visits, the answer is yes.

- Matt

About the Authors

Matt O'NeillChief Data Officer and EVP Engineering, Machine Learning, Product

Ford AltenbernTechnical Project Manager

Topics