/
openIMIS-AI - 4. AI methods, model outputs, and evaluation metric

openIMIS-AI - 4. AI methods, model outputs, and evaluation metric

AI model output

In this project, the aim is to categorize the medical items and services related to a claim. From a statistical point of view, a rejected item/service is a special case of outlier, meaning that the rejected (item or service) cases corresponds to records in the database that differ significantly from the majority of the database. Such rare cases are the results of unusual events, generating anomalous patterns of activity.

The analysed dataset corresponds to the openIMIS claim submissions in Nepal, from Mai 2016 to September 2020. After data cleaning, the dataset is composed of 3'460'592 claims and respectively 18’362’682 entities (corresponding to medical services and medications). Only 199’167 entities were manually rejected by the Medical Officers, representing only 1.073% of the considered database. As the Medical Officers are not able to manually review all the received claims, 63.49% of the entities were accepted without manual verification. Resuming the challenges related with the present project we can note:

  • unbalanced data set with few samples containing valuable information concerning the rejection cases;

  • the possibility to deal with changing patterns concerning the rejection claims;

  • partial labeling of the database as not all the data was manually reviewed by a Medical Officer.

The objective of the AI model is to classify the items/services (denoted herein by entity) related to health insurance claims. For the AI model:

  • Accepted entities correspond to the majority class and thus represents negative outcomes, converted to 0 in the implemented AI model.

  • Rejected entities correspond to the minority class and thus represents positive outcomes, converted to 1 in the implemented AI model.

AI evaluation metrics

In order to illustrate and analyse the results of an AI model, we will use the confusion matrix, described as follows:

 

 

Predicted label

 

 

Accepted

Rejected


True label

Accepted

True Negative (TN) entities

False Positive (FP)  entities

Rejected

False Negative (FN) entities

True Positive (TP) entities

As we deal with skewed database, specific evaluation methods must be employed. These metrics are based on four scenarios:

  • True Positive (TP): the AI model predicts 1, and the actual class is 1 (the prediction and the classification made by the Medical Officer correspond to rejected item or service);

  • True Negative (TN): the AI model predicts 0 and the actual class is 0 (the prediction and the classification made by the Medical Officer correspond to accepted item or service);

  • False Positive (FP): the model predicts 1, but the actual class is 0 (the prediction of the model classify the item/service as rejected, while it should be accepted);

  • False Negative (FN): the model predicts 0, but the actual class is 1 (the model predicts an accepted item or service, while it should be rejected).

The evaluation metrics that can be employed for unbalanced databases are the following:

  • Precision is expressed as P = TP/(TP+FN) and defines of all the predictions y = 1 which ones were correctly classified;

  • Recall is expressed as R = TP/(TP+FP) and defines of all the actual y=1, which one the model predicted correctly (also known as sensitivity or True Positive Rate);

  • A tradeoff between Precision and Recall;

  • F score or F1 score is the harmonic mean of precision and recall, expressed as F1 = 2*(P*R)/(P+R);

  • Specificity or True Negative Rate is expressed as SPC = TN/(TN+FP)

Dependencies of the AI model

The results obtained by the AI model will depend on several variables:

  • selected features: for the present project, we have chosen two features configurations: (1) a selection of 27 features, selected during the data analysis and visualization; (2) 27 features selected previously, but also 6 aggregated features related to submitted items and related amount by the insurer per week, month, year;

  • normalization method

AI model selection

For the claim categorization process, anomaly detection techniques can be employed in order to detect rejected claims or items. The definition of an outlier is given by Howkins (1980) as “an observation that deviates so much from other observation as to arouse suspicion that it was generated by a different mechanism”. It is also known as the process in detecting the pattern in the data whose behavior is not as normal as  expected, it is important in detecting rare events [Shah, 20xx]. The anomaly detection techniques are based on concepts like classification, nearest neighbour, clustering, statistics and information theory. [Shah, 20xx; Chandola2009; Ahmed2016]. For the present project, we are dealing with anomaly detection techniques able to assign a label to each test instance.

Supervised anomaly detection algorithms suppose that we have access to a training data set that has categorized/labeled observations for normal (accepted item/services) and abnormal (rejected cases). In a general manner, these techniques are building predictive models for normal class vs the abnormal/anomaly class. Several difficulties may arise when using these methods:

  • the abnormal cases are rarer compared to the normal class in the training data; in our case, we deal with highly imbalanced data (skewed data);

  • changing patterns in the anomaly cases are difficult to be detected by the prediction model;

  • having access to data with accurate and representative labels for the anomaly class is a real challenge; furthermore, not all the data in the database is categorized/label and only a small proportion of the database is manually reviewed by Medical Officers.

Semi-supervised anomaly detection algorithms require to access only to categorized/labelled normal class and its objective is to build a model for the normal class from the train set and use this model to identify abnormal cases in the test set. Several methods require access only to labelled abnormal cases; however, these methods have difficulties in representing accurately the anomaly cases and dealing with changing patterns.

Unsupervised anomaly detection has proven their efficiency for problems with imbalanced and highly imbalanced data sets, where the rejected items/services can be characterized by changing patterns. The existing methods has received tremendous efforts in the last decade [Zhong2018; Chandola2009].

  1.  [Shah, 20xx] Shah, P. A critical survey on Anomaly detection.

  2. [Chandola2009] Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR)41(3), 1-58.

  3. [Ahmed2016] Ahmed, M., Mahmood, A. N., & Islam, M. R. (2016). A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems55, 278-288.

  4. [Chalapathy2019] Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407. 2019 Jan 10.

  5. [Zong2018] Zong, Bo, Qi Song, Martin Renqiang Min, Wei Cheng, Cristian Lumezanu, Daeki Cho, and Haifeng Chen. "Deep autoencoding gaussian mixture model for unsupervised anomaly detection." In International Conference on Learning Representations. 2018.

1. Classification based Anomaly detection techniques

Classification techniques are employed in order to learn a model (called ‘classifier’) from a dataset of labeled entities (training dataset) and then, classify new entities/instances based on the learn model. The basic hypothesis is that the classifier can distinguish between normal (accepted claims/items) and abnormal (rejected claims/items) classes, by considering the given feature space. The techniques can be grouped in two categories: multi-class and one-class anomaly detection techniques. The multi-class classification based techniques rely on the availability of accurate labels for various normal class which is not always possible. Figure 1 illustrates the classification of one-class or multi-class ( representation for a two feature space).