This section focuses on evaluating the model using the testing subset, which consists of data that was not used during training or validation.
The AUC value in the first table, located in row zero, is 0.94. This is slightly lower than the AUC based on the training subset 0.95, which is expected. Achieving the same performance on test data as on training data is generally challenging in practice.
The second table below displays the entries of the Confusion Matrix, which helps assess the performance of a classification model by comparing predicted values to actual values [1]. Row zero represents 'True Negatives', where the 'ACTUAL_CLASS' was correctly predicted as 'No' (negative) by the 'PREDICTED_CLASS'. There are 2,538 'True Negative' cases.
Row 1 in the output table below refers to 'False Positives'. False Positives refer to cases that were incorrectly predicted as 'Positive' but are actually 'Negative' - there are only 14 such cases. Row 2 refers to 'False Negatives' (212), where cases were incorrectly predicted as 'Negative' but are actually 'Positive'. Finally, row three represents 'True Positives', where cases were correctly predicted as 'Positive', with 103 instances.
123456789# Test model generalization using the test data subset (not used during training)
scorepredictions, scorestats, scorecm, scoremetrics = hgbc.score(data=df_test , key= 'EMPLOYEE_ID', label='FLIGHT_RISK',
ntiles=20, impute=True,
thread_ratio=1.0)
#display(hgbc.runtime)
display(scorestats.sort('CLASS_NAME').collect())
display(scorecm.filter('COUNT != 0').collect())
#display(scoremetrics.collect())
ITEM_NUMBER | STAT_NAME | STAT_VALUE | CLASS_NAME |
---|---|---|---|
0 | AUC | 0.9414 | None |
1 | ACCURACY | 0.9211 | None |
2 | KAPPA | 0.4437 | None |
3 | MCC | 0.5081 | None |
4 | RECALL | 0.9945 | No |
5 | PRECISION | 0.9229 | No |
6 | F1_SCORE | 0.9573 | No |
7 | SUPPORT | 2552 | No |
8 | RECALL | 0.3269 | Yes |
9 | PRECISION | 0.8803 | Yes |
10 | F1_SCORE | 0.4768 | Yes |
11 | SUPPORT | 315 | Yes |
ITEM_NUMBER | ACTUAL_CLASS | PREDICTED_CLASS | COUNT |
---|---|---|---|
0 | No | No | 2538 |
1 | No | Yes | 14 |
2 | Yes | No | 212 |
3 | Yes | Yes | 103 |