According to Chapter 8, Section 18.104.22.168 of the Medicare Program Integrity Manual, “sampling units are the elements that are selected according to the design of the survey and the chosen method of statistical sampling. They may be an individual line(s) within claims, individual claims, or clusters of claims (e.g., a beneficiary). For example, possible sampling units may include specific beneficiaries seen by a physician during the time period under review; or, claims for a specific item or service. In certain circumstances, e.g., multi-stage sample designs, other types of clusters of payments may be used. In principle, any type of sampling unit is permissible as long as the total aggregate of such units covers the population of potential mis-paid amounts.”
In those 107 words, the Program Integrity Manual, which basically lays out the rules governing provider audits, tries to elucidate a very tenuous area of most audits, and that speaks to this: what is the appropriate unit to be audited? I have always had a problem with claim-level auditing, because in many cases, the claim itself is a composite of many different individual procedure codes. These can include codes for surgical, medical, E&M, radiological, supply, drug, and other types of services, procedures, and products.
In a recent audit in which I was a statistical expert, the Zone Program Integrity Contractor (ZPIC) chose to audit at the claim level. So they selected 47 claims at random and then audited the 227 individual lines within those claims. Some claims had only one procedure code, while others had up to 11. This was true of the sample, but the distribution of claim line frequency was different for the two. In fact, in the sample, the mean number of lines per claim was 4.2, and for the frame, it was 2.9. The problem here may not seem obvious, but it indicates that, in general, the claims for the sample are likely more complex than those for the frame.
So who cares? I mean, why would something like this matter? Well, in this case, I was working on an extrapolation audit, so the auditor took the average overpayment per claim in the sample and used it to extrapolate back to the entire population of claims within the frame. In this case, there were 18,422 claims in the frame, so even a small error in the sample would multiply to a huge potential error after extrapolation. This case was actually quite typical in that the use of the claim posed a serious bias against the provider, and in any case, added to the uncertainty of the extrapolated overpayment estimate. In the sample, we saw surgical codes, which are coded based almost exclusively on their description. There were also medical diagnostic codes, such as nerve conduction studies; therapeutic codes, such as an injection or manipulation; radiology codes, which are broken into several categories; and E&M codes, which have very specific guidelines associated with selecting the proper procedure code. In a given claim, some or all of these different categories might be included, and this goes against the purpose and objective of the audit.
This issue of units is of particular interest when a sample is stratified. In Section 22.214.171.124 of Chapter 8 of the Program Integrity Manual, it states: “the stratification scheme should try to ensure that a sampling unit from a particular stratum is more likely to be similar in overpayment amount to others in its stratum than to sampling units in other strata. Although the amount of an overpayment cannot be known prior to review, it may be possible to stratify on an observable variable that is correlated with the overpayment amount of the sampling unit.”
In every case for which I have been an expert, the auditor relied upon the paid amount in determining if and how a sample should be stratified, but this is exactly the wrong variable of interest. In every case in which I have participated, the payment amounts for a single code can vary by huge amounts. This is due to a multitude of reasons, from patient responsibilities such as copays and deductibles to rules and editing policies promulgated by the payor. This all but negates the importance of the relationship between paid amount and overpaid amount, and as such, it invalidates the statistical importance and significance of using the paid amount as a point of demarcation.
Truly, the purpose of an audit, and particularly an extrapolation audit, is to select a sample that is both random and statistically valid – and unfortunately, one does not guarantee the other. If you consider the overall diversity of codes, categories, procedures, and services contained within the universe of what a typical provider does, then it could only make sense that in this heterogeneous an environment, the claim is the wrong unit to be audited. It’s like trying to extrapolate the cost of a single bag of groceries back to all bags of groceries purchased from a given grocery store. Since each person will have their own highly variable and heterogeneous set of individual items, the best way to conduct the study would be item by item, and not bag by bag. This may not be true in a setting in which the physician tends to do a limited set of the same things over and over for all patients, but that is rarely the case, at least in my experience. It is so clear to me, as an independent statistician, that the line item should be the variable of interest and not the claim. If you are interested in encounters, then look at E&M codes. If it’s procedures that are of concern, then look at individual procedures. Since the rules and guidelines for each code category can be so different, it doesn’t make sense to take a composite of many (the claim) and try to extrapolate back to a universe of other disparate composites. And making sense is often the precursor to statistical significance.
I am not saying that the line is the correct unit in every case. In fact, I have not been exposed to every case, so I am certain that there are those for which an aggregate unit is justified. For example, I have seen audits in which dates of service are the units, and everything done on those dates is reviewed and extrapolated. As long as the sample meets the necessary requisite of statistical tests for validity, date of service can be a viable option. I have seen many audits for which the beneficiary was the unit of interest – and I guess that, as long as the time frame allows for a cycle of visits or treatments, this can be a logical choice. The point is, the claim is selected most often because it is the easiest to select, but laziness rarely results in statistical validity. In fact, I have never worked on an audit for which the auditor conducted standard accepted statistical validity and representativeness testing on a sample. I had one recently in which the auditor stated that the sample was statistically valid because the mean paid amount for the sample was “close” to the mean paid amount for the universe. “Close?” Really? Now that’s a statistical test that I have never read about in the literature.
So, what’s the takeaway? Take a close look at the unit of interest the next time you get audited. If it’s the claim, for example, analyze the average number of lines and different types and categories of codes within the claim set. If it is highly variable, which is what I most often experience, then maybe the sample is not appropriate for an extrapolation. In any case, be alert to what is going on during an extrapolation audit. Just because the auditor says that everything was done correctly doesn’t mean that it was. Sampling is highly complex and often technically challenging, and it behooves everyone undergoing an extrapolation audit to understand their rights to challenge the results. Remember, there are two ways of lying: not telling the truth and making up statistics.
And that’s the world according to Frank.
About the Author
Frank Cohen is the director of analytics and business intelligence for DoctorsManagement, a Knoxville, Tenn.-based consulting firm. Mr. Cohen specializes in data mining, applied statistics, practice analytics, decision support, and process improvement.
Contact the Author
Comment on this Article