Increased Outpatient Coding Intensity Following Hospital Adoption of AI-Enabled Scribing Warrants Examination but Likely Reflects Enhanced Rules-Based Documentation

March 12, 2026

Written by

Katie Patton•Austin Miller•Allison Oakes, Ph.D.•Matt O'Neill

Study Takeaways

Across six large health systems that have implemented AI-enabled scribing, both new and established outpatient visits shifted toward higher-intensity evaluation and management (E/M) codes between 2018 and 2024.
Coding intensity increased across the systems, though the magnitude varied, with high-acuity visit share increasing by approximately 12 to 20 percentage points for new patient visits and seven to 12 percentage points for established visits over the study period.
Between 2021 and 2024, the share of established patient E/M visits billed as high intensity increased across all diagnosis chapters and for most chapters for new patient E/M visits billed as high intensity.
AI-enabled scribing tools allow clinical documentation to be captured more thoroughly and accurately. The promise of automating processes with AI is that once a model “learns” the rules, it is less error-prone than humans in performing that task but continues to operate within the bounds of a deterministic coding system.

Artificial intelligence (AI)-enabled ambient scribing tools are being integrated across care delivery settings, such as hospital outpatient departments and their affiliated physician practices. Purpose-built to alleviate provider burnout and clinical documentation burden, these systems use natural language processing (NLP) and large language models (LLMs) to generate clinical notes during real-time patient encounters and integrate them directly into electronic health record (EHR) systems. As adoption of AI-enabled tools becomes more common, questions are emerging about how AI-assisted documentation may influence coding behavior, visit intensity and reimbursement.

Background

Ambient AI scribing refers to software that passively listens to clinician–patient encounters and automatically generates clinical documentation. Unlike traditional dictation tools, which transcribe spoken notes after a visit, or human scribes who take notes during a physician-patient encounter, ambient systems operate continuously during the encounter and use NLP to identify clinically relevant dialogue, structure clinical documentation into patient health records within EHR systems and extract data such as symptoms, diagnoses and medications. Additionally, in some platforms, ambient AI is utilized to suggest billing codes based on documented content.¹ Although AI-enabled clinical documentation tooling emerged in the late 2010s, deployment accelerated in the early 2020s. Similarly, the adoption of ambient AI scribing has accelerated in the last 24 months, as LLMs have improved contextual understanding and summarization accuracy.

In recent years, health systems across the U.S. have scaled the use of ambient AI tools across outpatient settings. A study of 263 providers across six health systems found that after 30 days with an ambient AI scribe, provider burnout decreased from 51.9% to 38.8%, with providers citing improvements in cognitive task load, time spent documenting after hours and focused attention on patients.²

The convergence of AI-enabled clinical documentation and ambient AI scribing has implications beyond reducing provider burnout, which has recently manifested in CMS’s and numerous national payers’ reports of increased “coding intensity.”³ Payers have also voiced concerns about the accuracy and quality of ambient AI, including risks of misinterpretation and overdocumentation that may alter clinical workflows if clinicians become overly reliant on AI-generated outputs. Whether the introduction of AI-assisted scribing is unduly influencing billing practices and systemwide spending is an important question for policymakers. It is incontrovertible that a systemic increase in higher-acuity codes and corresponding decrease in lower-acuity codes would have significant cost implications in the health economy.

Healthcare fraud is endemic, although historically the types of fraud are routine. Provider fraud usually takes the form of either ordering/performing an unnecessary procedure or billing for a service that was never performed. Payer fraud, usually takes the form of “over diagnosing” a member with co-morbidities for higher risk adjustment or receiving premiums from the government for fraudulent enrollees. The use of technology to “manufacture” clinical diagnoses is well-documented, particularly with risk-adjustment models employed by Medicare Advantage plans, most recently evidenced in Kaiser Permanente’s agreement to pay $556M to resolve the Department of Justice’s allegations of fraudulent coding.⁴Similarly, a recent Senate Judiciary Committee majority staff report documented numerous examples of using software to “manufacture” higher-acuity clinical diagnoses to receive higher Medicare Advantage payments.⁵

Some, if not all, vendors of AI-enabled coding platforms have undoubtedly emphasized opportunities for providers to maximize reimbursement. Importantly, maximizing reimbursement through legitimate means is both economically rational and, from a legal standpoint, a fiduciary obligation for officers of both for-profit and not-for-profit health systems. Because AI scribing is inherently rules based, a systematic increase in intensity could reflect that individual providers have historically under-coded, either out of incomplete clinical documentation, suboptimal coding practices, fear of violating the False Claims Act or some combination thereof.

In parallel with the proliferation of AI scribing technology, researchers have documented marked increases in coding intensity. Our 2025 analysis found that from 2018 to 2023, the share of visits coded at higher complexity levels increased across all outpatient settings, including emergency departments (e.g., CPT 99284 increased from 32.5% to 39.6%), urgent care centers (e.g., CPT 99204 rose from 34.0% to 40.6%) and physician offices (e.g., CPT 99214 grew from 38.5% to 45.0%).⁶ A multistate analysis estimated that inpatient upcoding contributed $14.6B in additional hospital payments in 2019 relative to earlier coding practices.⁷ Similarly, analyses from the Medicare Payment Advisory Commission have highlighted the fiscal implications of coding intensity growth in Medicare payment systems, including the role of documentation practices in shaping risk scores and reimbursement.⁸

Although certain research has examined ambient AI adoption, and other research has documented broader increases in outpatient coding intensity in recent years, evaluation of the relationship between AI-enabled ambient scribing and outpatient E/M coding patterns remains underexplored. This study assesses whether shifts in outpatient E/M code distribution occurred within health systems that adopted ambient AI tools and provides insight on how AI-enabled documentation may intersect with billing and coding behaviors.

Analytic Approach

National all-payer claims data were leveraged to examine E/M billing patterns at six large hospitals and health systems from 2018 to 2024. These hospitals and health systems were selected because of publicly available announcements related to the adoption of ambient AI scribing systems. Three systems are multistate health systems, while three operate in a single state. The health systems are geographically diverse, serving a combined 13 states. The analysis includes outpatient E/M services billed under CPT codes 99201-99205 for new patients and 99211-99215 for established patients, which correspond to increasing levels of medical decision-making complexity or total time spent during the encounter. 99201-99203 and 99211-99213 are defined as low intensity and 99204-99205 and 99214-99215 are defined as high intensity. CPT 99201, removed effective January 1, 2021, following revisions by the American Medical Association and the CMS, was accounted for in all longitudinal comparisons to ensure consistency across years. Primary diagnosis chapters associated with these E/M visits were examined in 2021 and 2024 to contextualize coding changes.

Findings

From 2018 to 2024, established patient visits shifted toward higher-intensity E/M codes across the health systems analyzed. At Health System A, the combined share of 99214-99215 increased from 59.8% to 67.2%, while 99211–99213 declined from 40.2% to 32.8%. Health System B saw 99214-99215 increase from 40.9% to 52.7%, with a corresponding decrease in 99211-99213 from 59.2% to 47.3% (Figure 1). Health System C had a smaller increase from a higher baseline, with 99214-99215 increasing from 65.3% to 72.9%, as 99211-99213 fell from 34.7% to 27.1%. Health System E had a similar pattern, with higher-acuity codes increasing from 56.6% to 64.6% and lower-acuity codes declining from 43.4% to 35.4%. Health System F saw 99214-99215 grow from 47.8% to 57.7%. For Health System D, 99214-99215 increased from 50.4% to 60.2%.

From 2018 to 2024, each health system analyzed shifted toward higher-intensity new patient E/M codes (Figure 2). At Health System A, the combined share of 99204-99205 increased from 51.3% to 64.8%, while 99201-99203 declined from 48.7% to 35.2%. Health System B saw 99204-99205 rise from 42.9% to 57.6%, with 99201-99203 decreasing from 57.1% to 42.4%. Health System C had the largest increase, with 99204-99205 growing from 60.5% to 80.0%, with a reduction in 99201-99203 from 39.5% to 19.9%. Health Systems E and F had similar patterns, with 99204-99205 increasing from 47.5% to 67.0% and 43.7% to 60.2%, respectively. Similarly, Health System D’s higher-acuity codes increased from 44.5% to 63.7% while lower-acuity codes declined from 55.4% to 36.3%.

Between 2021 and 2024, the share of established patient E/M visits billed as high intensity (99214-99215) increased across all ICD-10 diagnosis chapters. The largest increase was observed in factors influencing health status (46.6% to 62.9%) (Figure 3). Other chapters with substantial increases were mental and behavioral disorders (63.1% to 72.6%), neoplasms (65.7% to 73.7%), blood and immune disorders (64.8% to 72.1%) and digestive diseases (59.4% to 66.2%). Smaller increases were observed in ear and mastoid diseases (33.1% to 35.0%) and skin and subcutaneous diseases (39.7% to 41.8%).

Between 2021 and 2024, the share of new patient E/M visits billed as high intensity (99204-99205) increased across nearly every diagnosis chapter. The largest increases were observed in factors influencing health status (51.0% to 66.3%), respiratory diseases (45.4% to 57.8%) and digestive diseases (62.1% to 69.8%) (Figure 4). Meaningful increases were also seen in genitourinary diseases (56.1% to 65.5%), circulatory diseases (73.1% to 79.5%) and symptoms and abnormal findings (61.8% to 69.3%). However, ear and mastoid diseases remained essentially flat (38.9% to 39.0%). Eye and adnexa diseases were the only chapter to decline, from 64.1% to 59.0%. Chapters that already accounted for the highest baseline shares of high-intensity coding also increased from 2021 to 2024, including blood and immune disorders (80.4% to 83.6%), nervous system diseases (74.2% to 76.0%), mental and behavioral disorders (73.2% to 78.1%), neoplasms (70.5% to 76.4%) and endocrine and metabolic diseases (72.6% to 76.8%).

Conclusion

In recent decades, payers have invested heavily in advanced analytics and risk-adjustment optimization to refine payment accuracy, shaped by incentives to optimize revenue. In commercial markets, medical loss ratio (MLR) requirements further influence these revenue optimization incentives, as health plans must spend a defined share of premium revenue on medical care or quality improvement activities. Providers, who historically have not been as technologically advanced, are increasingly adopting technologies that similarly optimize revenue capture and reduce administrative burden within the bounds of established regulatory parameters. As both sides leverage AI-enabled tools across business units, reimbursement systems will invariably face scrutiny, both in government and commercial health insurance markets. The foundational question about revenue maximization is always intent, and AI-enabled coding platforms are no exception.

The promise of automating processes with AI is that once a model “learns” the rules, it is less error-prone than humans in performing that task. AI in the form of LLMs can be used for probabilistic purposes to predict what a series of words or phrases in a physician-patient encounter are most likely to “mean.” In contrast, AI in the form of NLP can be used to automate an established set of rules, such as listing a group of HCPCS codes from a patient encounter in an order that will result in the highest reimbursement for that encounter.

Logic suggests that the adoption of ambient AI scribing, which records every word spoken in a physician-patient interaction, would increase the amount of information about patients in EHRs. Importantly, digital documentation of physician-patient interactions has also created a time-stamped record for the duration of each encounter. Logic would also suggest that a tool that materially enhances clinical documentation would be leveraged for coding medical claims based on that enhanced documentation. It follows that a tool designed for enterprise scale that records more complete clinical documentation would reveal new, and materially different, billing patterns.

This analysis confirms that logic. The consistent upward redistribution of both new and established outpatient E/M visits toward CPTs 99204-99205 and 99214-99215 across the six analyzed health systems reveals increased coding intensity. The shift in coding for high-intensity new patient E/M visits (99204-99205) ranged from 12 to 20 percentage points, reaching as high as 80.0% of visits at one health system. For high-intensity established patient outpatient E/M visits (99214-99215), increases ranged from seven to 12 percentage points. These patterns were observed across geographically diverse and organizationally distinct health systems, all of which have adopted ambient AI scribing in recent years, suggesting external forces rather than isolated institutional behavior are catalyzing these changes. While the health of Americans has consistently declined for years, especially during the COVID-19 pandemic, the scale, uniformity and persistence of these changes suggest a cause that goes beyond outpatient case mix. These trends also coincide with the 2021 revisions to E/M coding guidelines in which 99201 was removed. The revised guidelines reduced emphasis on history and physical exam documentation and shifted coding determination toward medical decision-making or total time.

While the directionality of the changes in billing patterns is clear, the underlying causes of the changes are not clearly understood. Whether such changes represent improved accuracy or fraudulent billing, as several payers have implied in earnings calls, depends on the relationship between the accuracy and completeness of documentation and the true clinical complexity of the patient. Because payers have established longstanding clinical documentation policies, it is elementary from a technological process to compare the provider’s ambient AI documentation inputs with the payer’s documentation requirements for a “match.” Importantly, this “comparison” process could be transparent to all parties – provider, payer and patient.

If it were determined that coding intensity from AI-assisted scribing systems overstated actual clinical complexity, there would be financial implications for payers, especially CMS and self-funded employers. While an isolated visit that meets the definitional guidelines of CPT 99213 but is billed at 99214 may have minimal financial consequence, the implications would be more substantial if coding intensity shifts were driven by systemwide algorithmic decision making rather than actual clinical complexity. In such a scenario, even modest per-code rate differences (e.g., $10-$40) could translate into millions of dollars in wasteful expenditures when scaled across all outpatient sites of care. At the same time, the underlying cause of such AI-enabled “over-coding” would be easy to diagnose and resolve from a technological standpoint.

Importantly, AI-enabled coding is less complicated than most other examples of generative AI. Though medicine is a mix of art and science, the inputs and outputs of coding and revenue are well known and often deterministic. Many clinical diagnoses result from a test or procedure (i.e., whether a patient has blood cancer or a tumor or a low ejection fraction or high cholesterol or liver disease). And, with respect to time-bounded codes like E/M codes, ambient AI is almost certainly more accurate in confirming the length of the physician-patient encounter than past practice.

Because of the rapid advances in the technology underlying AI-assisted documentation and coding in the past 12 months, the underlying causes of increased coding intensity can be clearly understood, which calls into question why they have not. Said another way, ambient AI tools have increased the collection of evidence needed for accurate, defensible coding practices. If adoption of ambient AI is found to be a source of a “mismatch” between what was documented and what was coded – whether “under-coding” or “over-coding” – the relevant models could – and should – be “tuned” to reach the correct result.

If the mismatch were thought to be fraudulent, that also could be ascertained. Unlike other longstanding types of healthcare fraud, ambient AI scribing platforms create a digital record of every provider-patient encounter. If payers implying fraudulent provider billing from AI-enabled coding platforms wanted to verify the coding, the evidence exists to audit the accuracy of the claim. If such an audit found evidence of intentional coding malfeasance, then that would be a legal matter with implications understood by all parties.

Within this context, the increase in coding intensity found in this study is unlikely attributable to provider fraud. A computer system is almost always optimized for a goal, but it is still bound by a set of rules a human operator has defined and encoded – even an AI-based system. For the observed increase in coding to be fraudulent would require some amount of conspiratorial activity, either between AI-enabled tool providers or physicians across unrelated health systems.

The transparency of AI-assisted processes is an important, if underrepresented, societal question throughout the global economy. The underlying questions are numerous and merit thoughtful consideration: What did AI, in whatever form, do? Why did AI make the choice that it did? How did that choice manifest? What are the implications of that choice? Do humans agree that choice was optimal? Ambient AI-assisted scribing is only one of many processes that should be analyzed, and policymakers and patients should evaluate providers and payers based on their willingness to resolve these issues in a transparent manner.

Subscribe to our latest research

Get the latest insights delivered to your inbox.