“The worst form of inequality is to try to make unequal things equal.” -Aristotle

There are two major forms of statistics: descriptive statistics, which describe the current state of an event, condition, or data set, and inferential statistics, which use data in order to predict or infer some future state of an event, condition, or data set.

Descriptive statistics might, for example, give us the average charge for a procedure or the percentage of claims that are suspected to have been coded in error. In contrast, inferential statistics might predict the range of charges for a population or extrapolate the claims error in a sample to a universe of claims. 

In many audits, the auditor uses descriptive statistics to summarize the data. Perhaps it is the actual error rate measured as the dollars in overpayment divided by the total dollars in the sample being audited. Or maybe it simply totals the overpayment amounts by claim and assesses that as the overpayment demand.

Audits that result in an extrapolation, however, almost always depend upon inferential statistics to make their predictions or estimates. And while the statistical methods of inference are important, it is way too big of a topic for this article. Instead, I want to focus on one portion of the inferential model, and that is stratification.

Stratification: Groups of Variables

Stratification can be defined as a means of describing a particular way that data points within a data set or database are arranged to create groups of variables with like characteristics. For example, I could conduct a poll asking registered voters whether they would be more likely to vote for Jeb Bush or Hillary Clinton for president, and I would be able to make a prediction with a specified degree of certainty. But because Hillary Clinton is a female and Jeb Bush is a male, and because gender is known to play a role in social structure, I might instead want to stratify my sample into two groups: females and males. In this way, I am creating subgroups that are more similar in their characteristics than a sample that combines both. In essence, it is likely that the precision of results of the sample would be greater for each of the two groups than for the combined group as a whole; this is a goal of many types of studies.

Section of Chapter 8 in the Program Integrity Manual states the following:

The stratification scheme should try to ensure that a sampling unit from a particular stratum is more likely to be similar in overpayment amount to others in its stratum than to sampling units in other strata. Although the amount of an overpayment cannot be known prior to review, it may be possible to stratify on an observable variable that is correlated with the overpayment amount of the sampling unit.  

This is a very interesting guideline. In essence, the government is saying that it is appropriate to stratify the items (or data points) within a sample if, by doing so, we are able to combine the data points into groups that would share characteristics such that the overpayment amount for each of these groups would be more precise than if the sample were to combine all of the data points. 

The goal of a stratification, then, is to separate the universe into sample frames when the universe is not homogeneous, meaning that it consists of data points with unlike characteristics that can be organized into homogeneous subsets. As such, the strata should be based upon some logical component, and one that is more likely to accurately identify the cause of overpayments, if any, such as code type or diagnosis. This also meets the test of common sense since ultimately, in an extrapolation audit, we want the results to be as accurate and precise as possible.

Problem #1

The problem is that every audit in which I have participated as a statistical expert has had the stratification performed using the paid amount as the variable of interest, which is almost never the correct variable to use. The concept is that the paid amount can best predict the overpayment amount, but this assumption is flawed in a number of ways.

First of all, if the audit is performed at the claim level, it is likely that the claim will contain a number of individual procedure codes, which, by definition and guidelines, are disparate with regard to the manner in which those codes are determined. Different code groups also have different payment characteristics. For example, evaluation and management codes tend to have a lower payment amount and a higher rate of payment than, say, surgical procedures. As such, when you combine a surgical procedure and an E&M code (or some other non-surgical code), you no longer have an audit unit (the claim) for which the paid amount predicts the overpayment amount. This is particularly true when the overpaid amount can be a portion of the paid amount as well as the entire paid amount. In addition, the actual third-party paid amount for a given procedure code can vary by orders of magnitude even when reported by the same provider on the same day for the same payer.

Coding for surgical procedures, for example, is based on the actual procedures performed by the medical provider and/or his or her team. Medicine procedures, including both face-to-face encounters and ancillary services, rely upon a variety of information. E&M codes are used based on specific guidelines, which are different than those of any other category, and laboratory and pathology codes are normally based on the description of the test performed, the procedure used, and/or the body part or organ system being examined. In general, an audit does not focus on the amount that was paid or overpaid, but rather it examines whether the procedure or service reported was done correctly. The overpayment amount is simply the result of those findings and not the focus of the audit. More specifically, an audit rarely focuses on whether a properly reported and paid service or procedure was paid the correct amount – and with Medicare, this is particularly true, as the fee schedule is so transparent.

Clearly, then, each category represents a unique set of characteristics that would have an impact on both how the procedure code is determined, what the allowed and payment amounts are, and of greatest importance, the logic to validate its appropriateness. It becomes clear, then, that stratifying the data based on coding characteristics would be a far more accurate indicator of overpayment amounts than the variable (paid amount) that was used by the auditor.

Problem #2

Another problem that I often find with stratified sampling is that the auditor, even when using paid amounts as the variable of interest, uses little or no logic to create the individual strata.

Remember, the purpose is to separate the universe into sample frames of data points that share similar characteristics, and what was paid for a given claim is such an unstable metric, it is very difficult to create strata using any logical scheme. There are statistical methods that are available to identify specific breaks within a distribution of data, but I have actually never seen any one of them used in a payer audit, ever. In fact, in nearly every audit in which I have participated, the strata were simply arbitrary. I know this because I have tested the data and it never conforms to what the auditor does, and I also have never seen any written documentation that describes the logic used to form the strata. Many times, this results in one or more strata reporting a higher degree of variability than when combined, and this is never a good outcome. 

Other Problems

There are a lot of other issues with how government auditors create stratified samples, including the presence of statistical outliers (never appropriate), disproportionate sampling, oversampling higher-value claims while under-sampling lower-value claims, etc.

The list goes on, and the reason is actually quite simple: in my opinion, the auditors are just too lazy to do it right the first time. It takes more work and effort to stratify based on coding, for example, or to use a different audit unit such as the actual procedure code or the beneficiary. It can be unfortunate when an appeal is unsuccessful, because the auditor may have followed the Program Integrity Manual to the letter, which, in some cases, conflicts with generally accepted statistical practices. But in general, I have been involved in many successful appeals for which the extrapolation has been thrown out because the stratification process was done incorrectly.

Suffice it to say, an audit can be a complex event and involve more moving parts than many people care to address. But when hundreds of thousands or even millions of dollars are on the line, it behooves any organization to examine the logic, the method, and the results of a stratified sample to ensure that when an extrapolation estimate has been calculated, it fairly represents what the overpayment amount, if any, should be. 

And that’s the world according to Frank.

About the Author

Frank Cohen is the director of analytics and business intelligence for DoctorsManagement.  He is a healthcare consultant who specializes in data mining, applied statistics, practice analytics, decision support, and process improvement.  Mr. Cohen is also a member of the National Society of Certified Healthcare Business Consultants.

Contact the Author


Comment on this Article



Share This Article