openIMIS-AI - 4. AI methods, model outputs, and evaluation metric

AI model output

In this project, the aim is to categorize the medical items and services related to a claim. From a statistical point of view, a rejected item/service is a special case of outlier, meaning that the rejected (item or service) cases corresponds to records in the database that differ significantly from the majority of the database. Such rare cases are the results of unusual events, generating anomalous patterns of activity.

If we look to the database on claim level corresponding to 531900 claims, only 462 were rejected by the Rule Engine and therefore will not be considered in the study. Only 1034 claims were manually rejected by the Medical Officers, representing only 0.194% from the considered database. During the cleaning of the database, several claims were not considered into the study (as seen in previous section). As the Medical Officers are not able to manually review all the received claims, 63.40% of the claims were accepted without manual verification. Resuming the challenges related with the present project we can note:

unbalanced data set with few samples containing valuable information concerning the rejection cases;
the possibility to deal with changing patterns concerning the rejection claims;
partial labeling of the database as not all the data was manually reviewed by a Medical Officer.

The objective of the AI model is to classify the items/services related to health insurance claims. Thus, for a rejected item or service, the output of the model is considered equal to 1. Indeed it is common usage to represent the rarer cases with the output variable y = 1.

AI evaluation metrics

As we deal with skewed database, specific evaluation methods must be employed. These metrics are based on four scenarios:

True Positive (TP): the AI model predicts 1, and the actual class is 1 (the prediction and the classification made by the Medical Officer correspond to rejected item or service);
True Negative (TN): the AI model predicts 0 and the actual class is 0 (the prediction and the classification made by the Medical Officer correspond to accepted item or service);
False Positive (FP): the model predicts 1, but the actual class is 0 (the prediction of the model classify the item/service as rejected, while it should be accepted);
False Negative (FN): the model predicts 0, but the actual class is 1 (the model predicts an accepted item or service, while it should be rejected).

The evaluation metrics that can be employed for unbalanced databases are the following:

Precision is expressed as P = TP/(TP+FN) and defines of all the predictions y = 1 which ones were correctly classified;
Recall is expressed as R = TP/(TP+FP) and defines of all the actual y=1, which one the model predicted correctly (also known as sensitivity or True Positive Rate);
A tradeoff between Precision and Recall;
F score or F1 score is the harmonic mean of precision and recall, expressed as F1 = 2*(P*R)/(P+R);
Specificity or True Negative Rate is expressed as SPC = TN/(TN+FP)

AI model selection

Several AI algorithms are able to meet the proposed goals and correctly classify the claim items and services [Chalapathy et al, 2019; Chandola et al, 2009; Zong et al, 2018].

Supervised anomaly detection algorithm suppose that we have access to a training data set that has categorized/labeled observations for normal (accepted item/services) and abnormal (rejected cases). In a general manner, these techniques are building predictive models for normal class vs the abnormal/anomaly class. Several difficulties may arise when using these methods:

the abnormal cases are rarer compared to the normal class in the training data; in our case, we deal with highly imbalanced data (skewed data);
changing patterns in the anomaly cases are difficult to be detected by the prediction model;
having access to data with accurate and representative labels for the anomaly class is a real challenge; furthermore, not all the data in the database is categorized/label and only a small proportion of the database is manually reviewed by Medical Officers.

Semi-supervised anomaly detection algorithms require to access only to categorized/labelled normal class and its objective is to build a model for the normal class from the train set and use this model to identify abnormal cases in the test set. Several methods require access only to labelled abnormal cases; however, these methods have difficulties in representing accurately the anomaly cases and dealing with changing patterns.

Unsupervised anomaly detection has proven their efficiency for problems with imbalanced and highly imbalanced data sets, where the rejected items/services can be characterized by changing patterns. The existing methods has received tremendous efforts in the last decades and can be classified in three categories (Zhong et al,2018; Chandola et al.,2009):

Reconstruction based methods suppose that anomalies are incompressible and thus cannot be effectively reconstructed from low-dimensional projections. In this category, we can cite methods like Principal Component Analysis (PCA) with explicit linear projections, Kernel PCA with implicit non-linear projections induced by specific kernels, Robust PCA, as well as reconstruction error induced by Deep Autoencoders.
Clustering analysis is another unsupervised approach using the density estimation and anomaly detection. We can cite methods like Multivariate Gaussian Model, Gaussian Mixture Models, K-means, etc. The drawbacks of these methods are the curse of dimensionality and the difficulty to apply these methods for high dimensional data. For this reason, several methods conduct dimensionality reduction prior to clustering analysis.
One class classification approaches learn a discriminative boundary surrounding the normal class. We can cite methods like One-class SVM. For high dimensionality problems, such methods suffer from suboptimal performance. For this reason, dimensionality reduction methods might be jointly used with one-class classification approaches.

References

Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407. 2019 Jan 10.

Chandola, Varun, Arindam Banerjee, and Vipin Kumar. "Anomaly detection: A survey." ACM computing surveys (CSUR) 41.3 (2009): 1-58.

Zong, Bo, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. "Deep autoencoding gaussian mixture model for unsupervised anomaly detection." In International Conference on Learning Representations. 2018.

AI model output

AI evaluation metrics

AI model selection

References

Did you encounter a problem or do you have a suggestion?

Please contact our Service Desk