fcohen100Data mining is both an art and science. Roughly stated, the purpose is to extract useful information from data. Data mining has been used for many years and in a number of different ways, however it is only recently, with the advent of more powerful computers and more powerful software languages, that the practice has made significant gains in popularity – particularly when it comes to mining large databases.

Also known as predictive analytics, it is this set of methodologies that allows Amazon.com to recommend purchases based on what you have purchased from them (and often, from other vendors) in the past. It is predictive analytics that allows a seller to tell you that “customers who purchased this item also purchased that item.”

Using data mining techniques, lenders are able to determine the probability that someone will default on a loan, allowing them to adjust interest rates based on risk. Predictive analytics also is used to conduct credit card fraud analysis in real time. I am sure that many of you at some point have received a call from your credit card company asking you to validate a “suspicious” purchase.

Issues at the Pump

Recently I went to a gas station I go to quite often to fill up, and as I usually do, I swiped my credit card at the pump. For some reason, the gas was coming out very slowly; it took more than a minute just to pump one gallon of gas, so I finished the transaction and went across the street to another gas station. This time, when I swiped my card the transaction was denied, and I had to use another credit card. Several minutes later I got a call from my credit card company asking me to verify that I had indeed attempted a purchase at that second gas station. How in the world could they have responded that quickly? Well, I conduct hundreds if not thousands of transactions a year, and when it comes to gas, when I am home they know that I normally purchase from one location.

Secondly, I rarely (if ever) purchase a gallon of gas from one location and then, within minutes, attempt to purchase gas again from another location. In this case, the credit card company used some predictive analytical algorithm to score my purchase with regard to fraud risk, and it scored high enough to invoke some action (such as putting the account on hold until I could verify the purchase). It’s pretty cool stuff, actually.

The Application of Data Mining

I use data mining a lot in my work. It allows me to predict who is most likely to be an end user of my services, making my marketing efforts more efficient. In one recent case, we used data mining in order to determine the probability that a particular claim would be denied by a particular payer, giving us the opportunity to review charts in advance of billing. Regarding possible pending audits, we see potential for the use of data mining techniques, including predictive analytics, to identify “bad” claims (as defined by CMS).

Let’s take a look at CERT as an example. In the most recent CERT study, it was reported that 4.5 percent of the reviewed claims were considered overpaid due to lack of medical necessity. Determination of a medical necessity denial (or overpayment) normally is defined based on documentation contained within a chart, however there are other factors that come into play. Let’s say that a patient comes to the office complaining of a runny nose, itchy eyes and other symptoms that result in a diagnosis of seasonal rhinitis (ICD-9 code 477.0). Now let’s say the provider codes the patient with a 99204, the second-highest level of new office visit. An audit very well may support that level of visit based on the documentation guidelines; the auditor, however, might question whether that level of E/M code (complexity of DDx) is commensurate with the level of complexity of the diagnosis. In this case, it is likely that this claim line would result in a denial due to lack of medical necessity.

Understand the issue here: there is a direct relationship between the documentation and the procedure code and the documentation and the diagnosis code, but unfortunately there is not a nexus between the procedure code and the diagnosis code – and this is where the issue of medical necessity rears its highly judgmental and elastic head.

Predicting Claims Subject to Audit

So, then, how could we possibly know ahead of time what claims have the greatest probability of being subjected to a medical necessity review? Here is where I employ predictive analytics. To start I would access the 4,500 or so claims that were determined to have been overpaid. Next I would divide the database in half. Then I would take the first group of 2,250 claims and run them through a data mining program, training a number of different algorithms. Then I would run the other 2,250 claims through these trained algorithms to see which predicted the medical necessity outcome most accurately. What I have gained now is the ability to take all of the claims from your office, run them through my data mining algorithm and spit out the claims that are most likely to be audited for medical necessity. By extension, I have created a model that uses probability to predict the likelihood that any particular claim will be subject to an audit.  Pretty cool, huh?


What occurs during this process is that the algorithm searches for  variables that contribute most to generating the accurate prediction of the outcome. For example, it may find that there are certain combinations of diagnosis codes and procedure codes that register more frequently than others. In essence, the algorithm is measuring the probability that each of these combinations would predict the outcome successfully. Maybe certain modifiers actually affect the outcome, or maybe the specialty or place of service (or type of service) even factor in some way. The goal is to figure out how much these variables contribute to the final determination, then to use this knowledge to predict which claims need to be reviewed.

CMS and Predictive Analytics

As many of you may know by now, CMS has announced that they are partnering with Northrop Grumman to begin real-time analysis of Medicare claims using predictive analytics. This process will be similar to how FICO uses data mining to determine credit card fraud. While I am not involved in this project, I assume that Grumman has been given access to a large number of Medicare claims that have been subject to fraud and abuse determinations and has used this data (as in our CERT example) to create an algorithm that predicts the likelihood that any given claim will be deemed improper.

If you want my opinion, I am not overly confident that this is going to be very effective. There are a lot more variables that go into filing a claim than go into making a credit card purchase – and of these variables, many simply do not exist in terms of black and white. My prediction is that this practice may cause a significant delay in payment of Medicare claims without creating a matching benefit.

Getting to the Bottom Line

The bottom line is this: the business side of healthcare is incredibly complex, and considering the diversity of players, diagnoses, therapeutic and diagnostic procedures, etc., it is very difficult to know for sure whether or not a claim is going to get paid correctly. Factor in the notion that for nearly every documentation line there is a possible combination of one of five group codes, 200 reason codes and more than 600 remark codes (that’s 976 billion possible combinations), and you begin to get a picture of a very chaotic system. My experience is that no matter how good a job you do, there always is going to be some degree of disagreement that ultimately may result in audits of your claims.

As we become more sophisticated in our practices, we can begin to employ data mining techniques in order to predict which of our claims are most ripe for review, giving us an upper hand on both risk analysis and risk mitigation.

About the Author

Frank Cohen is the senior analyst for The Frank Cohen Group, LLC. He is a healthcare consultant who specializes in data mining, applied statistics, practice analytics, decision support and process improvement.

Contact the Author


To comment on this article please go to editor@racmonitor.com

Demystifying the Query (Audit) Process

Share This Article