# Extrapolation: How \$1.4 Million Becomes \$42 Million

What’s the difference between \$1.4 million and \$42 million?

Well, before you get your calculators out, let me make this really easy. If you are like me, the difference is \$40.6 million.

If you base your calculation on common core, however, it’s, well, wherever it leads you, as long as you try. But for the government, the answer is \$0. That’s right – for the U.S. Department of Health and Human Services (HHS) Office of Inspector General (OIG), there is no difference between \$1.4 million and \$42 million, when extrapolation is used. And in the case of Mount Sinai Health System in New York City, it wasn’t “used,” it was “abused.” And here is my reasoning behind that opinion.

I want to begin with a blanket statement: as a statistician, I am a proponent of extrapolation in general. I find it to be a very effective statistical technique for both predicting outcomes and inferring sample results to a larger population or sample frame. But because extrapolation can over-exaggerate errors, it is absolutely critical that the sample be as unbiased and representative of the universe (or sampling frame) as possible – and in this audit, as with most for which I am engaged, that does not appear to be the case.

Because of issues with the sampling methodology, even without all of the details, it was quite clear to me that this was not one of those audits that was a viable candidate for extrapolation. Why is that so important?  Well, because it turned a \$1.4 million overpayment estimate into a \$42 million demand – and by any standard, that should have required a much better approach than what OIG represented in its report. Here’s the backstory:

A few weeks ago, healthcare attorney David Glaser talked about the OIG extrapolation audit performed on Mount Sinai The audit was conducted on certain claims filed with Medicare during2012 and 2013. The report was issued in April 2017, and I have had the opportunity to review both the findings of the OIG and its reporting of the hospital’s response.

As you might guess, my interest was in the statistical component of the sampling and extrapolation, which is detailed (kind of) in Appendix A (Audit Scope and Methodology) and Appendix B (Statistical Sampling Methodology). The entire report can be found online at https://oig.hhs.gov/oas/reports/region2/21401019.pdf

At the top of page 14, under the heading “Office of Inspector General Response,” the following appears in the first paragraph:

“Under the 60-day rule, providers who identify overpayments are required to return them within 60 days (section 1128J(d) of the Act and 42 CFR § 401.305(b)(i)). In addition, providers must exercise reasonable diligence to determine whether they have received an overpayment and to quantify the amount of the overpayment (42 CFR § 401.305(a)(2)). In exercising reasonable diligence, providers are expected to determine whether or not overpayments of a similar type exist during a six-year lookback period (42 CFR § 401.305(f) and 81 Fed. Reg. 7654, 7663 (Feb. 12, 2016)). In addition, the provider is obligated to quantify the entire amount of the overpayment for this lookback period and may do so by using a statistically valid extrapolation methodology (42 CFR § 401.305(d)(1)).”

The above statement, while always alarming, should not be a surprise for any compliance manager or officer. It is, after all, the law of the land. Combine this with the following excerpt from the Program Integrity Manual, Chapter 8, Section 4.1.2, concerning the right of the government to audit:

“Statistical sampling is used to calculate and project (i.e., extrapolate) the amount of overpayment(s) made on claims. The Medicare Prescription Drug, Improvement, and Modernization Act of 2003 (MMA) mandates that before using extrapolation to determine overpayment amounts to be recovered by recoupment, offset or otherwise, there must be a determination of sustained or high level of payment error, or documentation that educational intervention has failed to correct the payment error. By law, the determination that a sustained or high level of payment error exists is not subject to administrative or judicial review.” (Emphasis added.)

It should be clear that a provider cannot contest the justification for the audit – only the quality and representativeness of the sample and the extrapolation methodology. It’s a bit of a narrow highway, if you ask me, and I defend some 50 or more of these cases a year.

According to Appendix B of the report, OIG selected a sampling frame of 6,369 claims in 12 different risk areas, which included such diverse and variable areas as outpatient claims with modifiers 25 and 59, inpatient rehabilitation facility (IRF) claims, inpatient manufacturer credits for replaced medical devices, and several other quite unique and disparate clinical claims areas. According to the OIG, each of these 12 risk areas represented a separate strata. This in and of itself, in my opinion, represents an inappropriate use of stratification, not just because of issues with sample size, but because they are segregating and then aggregating such a disparate set of variable types.

In my opinion, the audit should have been broken up into different audits rather than just different strata. It is important to note that of the 12 strata, the number of claims audited was equal to the number of claims in the frame. In essence, they audited 100 percent of the frame, and as such, these strata should have been excluded from the extrapolation calculation and rather added back to the total at face value. This is a fatal flaw that I have seen in many government audits.

Appendix C reported the sample results and estimates. Table 3 illustrates the results of their analysis for each of the 12 risk areas, including the dollar value associated with the claims in the frame, sample, and estimated overpayment amounts. They did not, however, indicate anywhere the methodology used to conduct the extrapolation, which is not only required, but necessary in order for a third party to validate the appropriateness of findings. For example, we don’t know whether they chose to extrapolate based on a percentage of overpayment or the average overpayment per claim, multiplied by the number of claims in the sample. Not having access to the entire database, it is also impossible to determine the distribution of overpayments; however, I can tell you from experience that it is most likely highly left-skewed, and therefore using the average would not produce an accurate measurement for the point estimate.

Finally, in Table 4, they show their results as the point estimate (for which we do not have any indication as to how it was calculated) along with the lower limit of a one-sided, 90-percent confidence interval. The latter is a common way to estimate overpayment in an extrapolation audit. It considers the likelihood of sample error and therefore eliminates the argument regarding the absolute accuracy of the extrapolated results. Interestingly, the OIG’s calculation indicates another fatal flaw, and that is the precision of the results. According to the definition in Stat Trek (stattrek.com/statistics/dictionary.aspx?definition=Precision):

“Precision refers to how close estimates from different samples are to each other. For example, the standard error is a measure of precision. When the standard error is small, estimates from different samples will be close in value, and vice versa.”

In essence, the smaller the precision value, the smaller the sample error and the more “precise” the estimate. On page 4 of the OIG report, we find the following:

“We conducted this performance audit in accordance with generally accepted government auditing standards.”

Yet it is clear that they did not follow these standards with regard to the precision value calculated in Table 4. The Office of Management and Budget (OMB) in its Circular A-123 states that “federal agencies must produce a statistically valid error estimate that meets precision levels of plus or minus 2.5 percentage points with a 90-percent confidence interval.” And in the Aug. 31, 2007 Federal Register, the Centers for Medicare & Medicaid Services (CMS) says basically the same thing: that the error estimate should meet precision levels of plus-or-minus 2.5 percentage points with a 90-percent confidence interval. Yet it is quite clear that this did not happen in this case. For this audit, the precision rate was over 17 percent,  nearly seven times higher than the mandated figure quoted in both the OMB and CMS guidelines.

In reading through the hospital’s 28-page response, I thought that they did a pretty good job of challenging the extrapolation based on some qualitative issues, but only Section V, which consisted of a single paragraph, was truly committed to the statistical methodology. It may be that in this case, challenging the audit methods, strata identification, and qualitative findings may be more important than the raw statistical flaws. But it is my experience that it is important to hit them with the statistical flaws early and to hit them hard.

In the immortal words of Spartan King Leonidas 1, molon labe! (Come and take them!)

And that’s the world according to Frank.