Page Comparison

...

criterion: measuring the impurity of the split with possible values: ‘gini’ (default value) and ‘entropy’
maximum depth of the tree, with values from 5 to 61 (default is ‘None’)
minimum number of entities contained in a node in order to consider splitting with possible values between 4 and 40 (default value is 2)
minimum number of entities contained in a leaf with possible, with possible values between 3 and 40 (default value is 1)

...

Based on the confusion matrix, we can compute the evaluation metrics considered in this study:

Precision = 0.5920 6794 (i.e. from all entities predicted as rejected, 5967.20% 94% were correctly classified)
Recall = 0.6200 6023 (i.e. from all entities rejected by the Medical Officers, the classifier correctly categorized 62%60.23%)
f1 score = 0.60576385
Accuracy = 0.9786 9819 (I.e. from all the entities considered, 9798.86% 19% of items were correctly categorized)

In terms of execution time, the training time was about 341 80.94 s, while the prediction time was 0.32 08 s on the computed configuration mentioned earlier.

The default parameter values for the Decision Tree Classifier are the following: criterion = ‘gini’, splitter = ‘best’, max_depth = None, min_samples_split = 2, min_samples_leaf = 1, min_weight_fraction_leaf = 0, max_features = None, random_state = None, max_leaf_nodes = None, min_impurity_decrease = 0, min_impurity_split =0, class_weight = None, ccp_alpha = 0.

All the information concerning the confusion matrix, evolution metrics and execution time values are presented in Table 1.

The best hyperparameters obtained correspond to criterion = ‘entropy’, max_depth = 20; min_samples_leaf = 20, min_samples_split = 22 (for the other parameters, the default values were considered). The confusion matrix, evaluation time and evaluation metrics are given below. While the False Negative cases have slightly increased with respect to the previous prediction, the False Positive entities have decreased from 5'109 to 2’920 (and implicitly a higher Precision value)

...

Evaluation metrics on the test set: Precision = 0.7864; Recall = 0.5980; f1-score = 0.6794; Accuracy = 0.9850

The training set was composed of 677’212 entities, with 659’235 accepted and 17’977 rejected entities.

Versions Compared

Old Version 4

New Version 5

Key