Cumulative accuracy profile

The cumulative accuracy profile (CAP) is used in data science to visualize the discriminative power of a model. The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The CAP is distinct from the receiver operating characteristic (ROC), which plots the true-positive rate against the false-positive rate.

Example

An example is a model that predicts whether a product is brought (positive outcome) by each individual from a group of people (classifying parameter) based on factors such as their gender, age, income etc. If group members would be contacted at random, the cumulative number of products sold would rise linearly toward a maximum value corresponding to the total number of buyers within the group. This distribution is called the "random" CAP. A perfect prediction, on the other hand, determines exactly which group members will buy the product, such that the maximum number of products sold will be reached with a minimum number of calls. This produces a steep line on the CAP curve that stays flat once the maximum is reached (contacting all other group members will not lead to more products sold), which is the "perfect" CAP.

The CAP profiles for the perfect, good and random model predicting the buying customers from a pool of 100 individuals.

A successful model predicts the likelihood of individuals purchasing the product and ranks these probabilities to produce a list of potential customers to be contacted first. The resulting cumulative number of sold products will increase rapidly and eventually flatten out to the given maximum as more group members are contacted. This results in a distribution that lies between the random and the perfect CAP curves.

Analyzing a CAP

The CAP can be used to evaluate a model by comparing the curve to the perfect CAP in which the maximum number of positive outcomes is achieved directly and to the random CAP in which the positive outcomes are distributed equally. A good model will have a CAP between the perfect CAP and the random CAP with a better model tending to the perfect CAP.

The accuracy ratio (AR) is defined as the ratio of the area between the model CAP and the random CAP and the area between the perfect CAP and the random CAP.[1] For a successful model the AR has values between zero and one, with a higher value for a stronger model.

Another indication of the model strength is given by the cumulative number of positive outcomes at 50% of the classifying parameter. For a successful model this value should lie between 50% and 100% of the maximum, with a higher percentage for stronger models.

On very rare cases the accuracy ratio can be negative. In this case, the model is performing worse than the random CAP.

Applications

The CAP and the ROC are both commonly used by banks and regulators to analyze the discriminatory ability of rating systems that evaluate the credit risks [2] [3]

References

  1. Calabrese, Raffaella (2009), The validation of Credit Rating and Scoring Models (PDF), Swiss Statistics Meeting, Geneva, Switzerland
  2. Engelmann, Bernd; Hayden, Evelyn; Tasche, Dirk (2003), "Measuring the Discriminative Power of Rating Systems", Discussion Paper, Series 2: Banking and Financial Supervision (No 01)
  3. Sobehart, Jorge; Keenan, Sean; Stein, Roger (2000-05-15), "Validation methodologies for default risk models" (PDF), Moody's Risk Management Services
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.