To begin, I believe that the recent audit of Northwestern reflects the generally poor processes and statistical modeling used by Centers for Medicare & Medicaid Services (CMS) auditors when scrutinizing a healthcare provider.

In particular, there seems to be this general idea that there is some degree of homogeneity within the broad range of diagnostic and therapeutic treatments (as well as the delivery of medical supplies, drugs, and other services and procedures) routinely performed on patients. For example, durable medical equipment (DME)-type claims should not ever be subject to extrapolation because there is such a large degree of variance surrounding nearly every aspect of how hardware, drugs, and supplies are prescribed for any given patient. It’s a bit absurd to include supplies that are required for cervical traction with codes that denote procedures such as neuromuscular stimulation. Even stratification can’t fix these types of problems, yet I see it happen quite often. We’re talking about apples and oranges.

Because the purpose of this type of audit is to extrapolate findings to a larger universe of data, we work within the realm of inferential statistics, which is different from descriptive statistics. In inferential statistics, when we attempt to “infer” the results of a sample to a larger universe, we have to be very careful how the sample is selected and very certain as to how our calculations are made. Because we are going outside of the sample itself, it is critically important that the sample is representative of the applicable universe – or in this case, the sampling frame. In pretty much every case I have worked on, government auditors worry about one thing and one thing only: is the sample random? This is great if all you are doing is trying to describe certain characteristics of the sample, such as some point estimate or tendency towards some location of data, but it is simply not enough when the goal is to infer the results of the findings in the sample to some data set that is larger than the sample. Just because you stratify a sample doesn’t mean that it was stratified properly.

In the Northwestern audit, short stays are identified as a single stratum – which, in my opinion, is egregiously incorrect. Even though the guidelines specify what constitutes a short stay, there are many reasons that a patient might be considered a part of this category. There are hundreds if not thousands of decision points that are made along the way for each such patient, including administrative, policy, regulation, time-of-day, and census, in addition to those made by the medical provider in determining the likely admission status of the patient.

Describing what the sample looks like statistically is one thing, but attempting to extrapolate something that has that many moving parts and uncertainties is, again, in my opinion, irresponsible. And then the notion that they only selected 40 claims makes it very clear to me that the government did not conduct any type of probe audit from which a proper sample-size calculation could be made. 

This same logic holds true for stratum 2: inpatient claims billed with high-severity-level DRG codes. And the third stratum, outpatient claims billed with Modifier -59, a low-dollar stratum, uses a totally different technique to determine whether the payment was appropriate – and as such, it should not have been a separate stratum of the same audit (but more likely, a completely separate audit).

The fact is, if you are going to extrapolate an audit and create huge potential financial damage to an organization, then you should at least get the basics right. And the kicker here, the one single point that everyone seems to have missed that supports all of my objections, is this: the lower boundary of the one-sided confidence interval that the government published in its report is 62.8 percent of the point estimate. This means that the precision level is 37.2 percent, or significantly higher than the Office of Inspector General (OIG) states is acceptable in order to conduct an extrapolation. 

The moral of this story is this: while all universes can be studied, not all samples can be extrapolated.

And that’s the world according to Frank.

About the Author

Frank Cohen is the director of analytics and business intelligence for DoctorsManagement. He is a healthcare consultant who specializes in data mining, applied statistics, practice analytics, decision support, and process improvement. Mr. Cohen is also a member of the National Society of Certified Healthcare Business Consultants (NSCHBC.ORG).

Contact the Author

Comment on this Article

Share This Article