Changelog
Version 0.0.5a3
Major Features Added
Axis Limits: Added
xlimandylimparameters for standardizing axis ranges across multiple model comparisonsBottom Legend Support: Automatic figure height adjustment when
legend_loc="bottom"to prevent legend overlap with x-axis labelsMulti-Model Layout: Improved default layout for multiple models with
plot_type="all"- now arranges as one row per model (6 columns × N rows) instead of mixed layout
Bug Fixes
Heteroskedasticity Tests: Fixed categorical variable handling in all tests (Breusch-Pagan, White, Goldfeld-Quandt) by encoding categorical columns before running tests
Legend Formatting:
Fixed legend duplication bug in Scale-Location plot (removed test name prepending since interpretations already contain test names)
Fixed histogram legend to use
apply_legend()for consistent formattingFixed legend kwargs not being properly passed to Scale-Location plot
Group Category Handling: Fixed KeyError when
group_categorycolumn not inXDataFrame by properly checking column existence before filtering predictor columnsIndex Alignment: Fixed AssertionError when using external
group_categoryarray by ensuring Series index matchesX.indexPython 3.8 Compatibility: Fixed LaTeX rendering error in Scale-Location y-axis label by replacing
\text{}with\mathrm{}for matplotlib <3.3 compatibility
Enhancements
Scale-Location Y-Axis: Changed to LaTeX notation
r"$\sqrt{|\text{Std. Residuals}|}$"for better readabilityHistogram Overlay: Removed normal distribution overlay from
histogram_type="frequency"for cleaner, simpler visualization (overlay still present forhistogram_type="density")Text Wrapping: Added
text_wrapparameter support to all subplot titles (previously only worked for suptitles)Helper Functions:
Created
apply_axis_limits()helper inplot_utils.pyEnhanced
apply_legend()to handle bottom legend resizing with flag-based prevention of multiple resizes
Refactoring: Refactored
plot_threshold_metrics()to useapply_plot_title()andapply_legend()helpers for consistency
Version 0.0.5a2
Testing Improvements
Added 150+ comprehensive unit tests covering:
Edge cases and error handling
Parameter validation
Different input pathways (model vs y_prob vs y_pred)
Group category functionality
Styling and customization options
Integration scenarios
Code Quality
Refactored monolithic 3866-line
model_metrics.pyinto modular components:model_evaluator.py- Main plotting and evaluation functionsmetrics_utils.py- Utility functions and calculationsplot_utils.py- Plotting helper functions
Improved code maintainability and organization
Enhanced error messages and validation
Operating point visualization with two methods:
operating_point_method='youden'- Youden’s J statisticoperating_point_method='closest_topleft'- Closest to top-left corner
DeLong test support for AUC comparison between models via
delongparameterLegend ordering - Proper organization: AUC curves → Random Guess → Operating Points
Custom operating point styling via
operating_point_kwgs
Residual Diagnostics Expansion
New plot types:
'influence'- Influence plot with Cook’s distance bubbles'predictors'- Individual residual plots for each predictor
Heteroskedasticity testing with multiple methods:
'breusch_pagan'- Breusch-Pagan test'white'- White’s test'goldfeld_quandt'- Goldfeld-Quandt test'spearman'- Spearman rank correlation'all'- Run all tests
LOWESS smoothing via
show_lowessparameterCentroid visualization with two modes:
User-defined groups via
group_categoryAutomatic K-means clustering via
n_clusters
Histogram types:
histogram_type='frequency'- Raw counts (default)histogram_type='density'- Probability density with normal overlay
Diagnostics table - Comprehensive model diagnostics via
show_diagnostics_tableReturn diagnostics - Programmatic access via
return_diagnostics=True
Group Category Support
All classification plots now support
group_categoryparameter:ROC curves with per-group AUC and counts
PR curves with per-group metrics
Calibration curves with per-group calibration
Residual diagnostics support group visualization with centroids
Summary performance supports grouped classification metrics
Gain Chart Enhancement
Gini coefficient calculation and display via
show_giniparameterCustom decimal places for Gini via
decimal_placesparameter
Legend Customization
Legend location now supports:
Standard matplotlib locations (‘best’, ‘upper right’, etc.)
'bottom'- Places legend below plot (perfect for group categories)
Automatic legend ordering for better readability
summarize_model_performance
Added
include_adjusted_r2for regression modelsAdded
group_categoryfor grouped classification metricsAdded
overall_onlyfor regression to show only aggregate metricsImproved coefficient ordering (intercept first)
Better handling of feature importances for tree-based models
show_confusion_matrix
Added
show_colorbarparameter (default: False)Added
labelsparameter to toggle TN/FP/FN/TP labelsImproved font size controls (
inner_fontsize,label_fontsize,tick_fontsize)
show_roc_curve
Added
show_operating_pointandoperating_point_methodAdded
operating_point_kwgsfor custom stylingAdded
delongparameter for AUC comparisonAdded
group_categoryfor stratified analysisAdded
legend_locparameter
show_pr_curve
Added
legend_metricparameter (‘ap’ or ‘aucpr’)Added
group_categoryfor stratified analysisAdded
legend_locparameter
show_calibration_curve
Added
show_brier_scoreparameter (default: True)Added
brier_decimalsfor formattingAdded
group_categoryfor stratified analysisAdded
legend_locparameter
show_gain_chart
Added
show_giniparameter (default: False)Added
decimal_placesfor Gini formatting
plot_threshold_metrics
Added
lookup_metricandlookup_valuefor threshold optimizationAdded
model_thresholdto highlight specific thresholdsAdded
baseline_threshto toggle baseline lineAdded custom styling:
curve_kwgs,baseline_kwgs,threshold_kwgs,lookup_kwgs
show_residual_diagnostics
Added
plot_typeoptions: ‘all’, ‘fitted’, ‘qq’, ‘scale_location’, ‘leverage’, ‘influence’, ‘histogram’, ‘predictors’Added
heteroskedasticity_testwith multiple test optionsAdded
show_lowessfor trend linesAdded
lowess_kwgsfor LOWESS stylingAdded
group_categoryfor stratified analysisAdded
group_kwgsfor custom group stylingAdded
show_centroidsandcentroid_kwgsAdded
centroid_type(‘clusters’ or ‘groups’)Added
n_clustersfor automatic clusteringAdded
histogram_type(‘frequency’ or ‘density’)Added
show_diagnostics_tableandreturn_diagnosticsAdded
show_plotsto disable plottingAdded
show_outliersandn_outliersfor labelingAdded
legend_locparameterAdded
legend_kwgsto control legend display for groups, centroids, clusters, and het_testsAdded
kmeans_rstatefor reproducible clusteringAdded
n_colsandn_rowsfor custom subplot layoutsAdded
point_kwgsfor scatter point styling (supportsedgecolor,linewidth, etc.)
Bug Fixes
Fixed confusion matrix colorbar removal when
show_colorbar=FalseFixed duplicate text handling in confusion matrix displays
Fixed legend placement for grouped visualizations
Fixed text wrapping for long titles
Fixed LOWESS exception handling (now fails gracefully)
Fixed feature importance display for tree-based models
Fixed coefficient ordering in regression output
Fixed empty metric columns in regression feature importance rows
Documentation Improvements
Comprehensive docstrings for all major functions
Parameter descriptions with examples
Error message improvements for better debugging
Type hints and validation error messages
Usage examples in docstrings
Testing
Test suite expanded from ~50 tests to 152 tests
Coverage increased from 50% to 86% on core modules
All edge cases and error conditions tested
Integration tests for real-world workflows
Parametrized tests for systematic coverage
Performance
No performance regressions
Modular code structure improves maintainability
Efficient calculation caching where applicable
Migration Guide
From 0.0.5a1 to 0.0.5a2:
No changes required - all existing code will work as before. New features are opt-in:
Version 0.0.5a1
Added
Operating Point Visualization for ROC Curves: Added
show_operating_pointparameter to display optimal classification thresholds on ROC curves with two methods:youden: Youden’s J statistic (maximizes TPR - FPR)closest_topleft: Point closest to top-left corner (minimizes distance to perfect classifier)Configurable via
operating_point_methodandoperating_point_kwgsparametersOperating points display threshold values in legends and appear as markers on curves
Gini Coefficient for Gain Charts: Added automatic calculation and display of Gini coefficient in
show_gain_chart()Prints Gini coefficient for each model (default: 3 decimal places)
Displays in legend labels across all plot modes (overlay, subplots, single)
Configurable via
show_ginianddecimal_placesparameters
Legend Location Control: Added
legend_locparameter to all plotting functions for flexible legend positioningSupports standard matplotlib locations (
'lower right','upper left','best', etc.)Special ‘bottom’ option places legend below plot with proper spacing
Available in:
show_roc_curve(),show_pr_curve(),show_calibration_curve(),show_lift_chart(),show_gain_chart()
Improved
Legend Ordering for ROC Curves: Standardized legend entry order across all plot modes
Order: Model curves with AUC → Random Guess baseline → Operating points
Ensures consistent, intuitive legend presentation
Overlay Mode for ROC Curves: Enhanced operating point display in overlay plots
Combined AUC and operating point threshold in single legend entry
Format: “Model Name (AUC = 0.XX, Op = 0.XX)”
Operating point markers appear on curves without duplicate legend entries
Technical Details
Operating points calculated post-ROC curve generation using optimal threshold selection
Gini coefficient derived from area under gain curve:
Gini = 2 × AUGC - 1Legend positioning uses
bbox_to_anchorfor'bottom'placement with dynamic spacingAll changes maintain backward compatibility with existing code
Version 0.0.4a10
Refactored and stabilized the summarize_model_performance function to improve consistency across classification and regression workflows while preserving the exact formatting logic for printed outputs and regression coefficient display.
Changes
Consolidated redundant metric computation into dedicated helper functions for classification and regression metrics.
Ensured regression coefficients, intercepts, and feature importances are retained and ordered correctly in the final DataFrame output.
Fixed grouped classification output so Model Threshold always appears last, and group headers correctly reflect category names.
Added conditional handling for grouped classification to prevent
KeyErrorwhen the"Model"column is absent.Preserved the original manual formatting block to maintain Leon’s custom printing logic for both classification and regression:
Right-aligned all table columns for readability.
Retained separator-based visual formatting and model-wise breaks.
Preserved coefficient and intercept reporting behavior exactly as before, ensuring regression results remain interpretable and consistent.
Impact
Classification and regression now produce stable, well-ordered, and readable summaries.
Grouped and non-grouped runs behave consistently without disrupting regression coefficient output.
Backward compatibility with previous console and DataFrame output formats maintained.
Version 0.0.4a9
This release introduces a new parameter, brier_decimals, to the show_calibration_curve() function, allowing users to control the number of decimal places displayed for the Brier score.
Changes Made
Added
brier_decimalsparameter (default:3) next toshow_brier_score.Updated Brier score display logic to format using
round(brier_score, brier_decimals).Improved readability and precision consistency across calibration plots.
Impact
No breaking changes.
Users now have finer control over Brier score precision in calibration curve visualizations.
Quick Example
from model_metrics import show_calibration_curve
show_calibration_curve(model, X, y, show_brier_score=True, brier_decimals=4)
Version 0.0.4a8
Summary:
Updated hanley_mcneil_auc_test() function to perform a large-sample z-test for comparing correlated AUCs, based on Hanley & McNeil (1982), an analytical approximation of DeLong’s test.
Key Changes:
Implemented
hanley_mcneil_auc_test()with parameters:y_true,y_scores_1,y_scores_2for AUC comparison.Optional
model_names,verbose, andreturn_valuesarguments for flexible use.
Added formatted, human-readable print output (when
verbose=True).Enabled optional programmatic access with
return_values=True.Adopted NumPy-style docstring for clarity and consistency.
Integrated helper into
show_roc_curve()to enable AUC significance testing when thedelongargument is provided.
Notes: This helper can also be used as a standalone function for independent AUC comparison between two models, outside of visualization workflows.
Version 0.0.4a7
DeLong’s test (Hanley & McNeil approximation)
Implemented a new helper function
hanley_mcneil_auc_test()for approximate DeLong’s AUC comparison.Integrated the helper inside
show_roc_curve()to optionally print AUC differences and p-values between two models.Added corresponding pytest coverage under
test_show_roc_curve_with_delong().
Group category support
Added the
group_categoryinput tosummarize_model_performance()to generate subgroup-level performance summaries.Enables stratified metric reporting for fairness or demographic analysis.
Version 0.0.4a6
Reworded the print message inside plot_threshold_metrics() for clarity.
Old:
print(
f"Best threshold for {lookup_metric} = "
f"{round(lookup_value, decimal_places)} is: "
f"{round(best_threshold, decimal_places)}"
)
New:
print(
f"Best threshold for target {lookup_metric} of "
f"{round(lookup_value, decimal_places)} is "
f"{round(best_threshold, decimal_places)}"
)
This removes the equals sign and colon, and adds “target” for a smoother, more descriptive sentence.
Version 0.0.4a8
Added a minimal type check to ensure
y_probis always a list at the start of each affected function:summarize_model_performanceshow_calibration_curveshow_confusion_matrixshow_lift_chartshow_gain_chartshow_roc_curveshow_pr_curve
# Ensure y_prob is always a list of NumPy arrays
if isinstance(y_prob, np.ndarray):
y_prob = [y_prob]
This allows y_prob[0] indexing to work whether the caller provides a single
NumPy array or a list of arrays.
Updated unittests
Version 0.0.4a4
Corrected README to reflect the current version.
Previous release did not update the README properly because the file was not saved before publishing.
No functional changes to the library.
Version 0.0.4a3
Added missing
scipy (>=1.8,<=1.14.0)requirement to the README.
Version 0.0.4a2
This version updates pyproject.toml and requirements.txt to restrict SciPy to >=1.8,<=1.14.0.
Prevents installation of
scipy==1.14.1+, which removes_lazywhereand breaksstatsmodels.Keeps compatibility with
model_tunerand Colab environments.Bumps package version for release.
Updated
scipydependency to>=1.8,<=1.14.0Synced
requirements.txtwith updated constraints
Version 0.0.4a1
Replaced the old
gridparameter withsubplotsacross plotting functions for consistency.Standardized gridline handling by replacing unconditional
plt.grid()calls withplt.grid(visible=gridlines)
Why
Aligns function signatures to use subplots consistently instead of grid.
Makes gridline visibility configurable through a single gridlines flag.
Cleaner charts when gridlines=False, no visual change when gridlines=True.
Version 0.0.4a
Summary
Added the ability to pass predicted probabilities (y_prob) directly into
the functions in model_evaluator.py as an alternative to supplying a fitted
model and feature matrix. This flexibility lets end users evaluate results in two ways:
Using a model object with
X(current behavior)Or passing
y_probdirectly (new option)
Details
Updated all relevant evaluator functions (
summarize_model_performance,plot_threshold_metrics, etc.) to accepty_probas input.Added input validation: functions now check that either
(model and X)ory_probare provided, not both missing.Preserved existing model-based workflows for backward compatibility.
Extended unit tests in
unittests/to cover the new probability-based path, including edge cases and validation errors.
Why
End users sometimes already have predicted probabilities from external pipelines or pre-computed experiments. This change avoids forcing them to re-supply the model, streamlining the evaluation process.
Version 0.0.3a
Added
"plotly>=5.18.0, <=5.24.1"inpyproject.toml,setup.py,README_min.md–> forpartial_dependence.pyfunctions
Version 0.0.2a
Add
show_ks_curvefunction and enhancesummarize_model_performanceby @lshpaner in https://github.com/lshpaner/model_metrics/pull/1Add
plot_threshold_metricsFunction by @lshpaner in https://github.com/lshpaner/model_metrics/pull/2Add
pr_feature_plotand Updateroc_feature_plotfor Enhanced Visualization by @lshpaner in https://github.com/lshpaner/model_metrics/pull/3Reg table enhance by @lshpaner in https://github.com/lshpaner/model_metrics/pull/4
Rmvd (%) from MAPE header by @lshpaner in https://github.com/lshpaner/model_metrics/pull/5
Moved roc legend to lower right default by @lshpaner in https://github.com/lshpaner/model_metrics/pull/6
Allow Flexible Inputs and Save Behavior for
show_roc_curve()by @lshpaner in https://github.com/lshpaner/model_metrics/pull/7Prcurve calc tests by @lshpaner in https://github.com/lshpaner/model_metrics/pull/8
Removed unused imports and functions by @lshpaner in https://github.com/lshpaner/model_metrics/pull/9
changed saving nomenclature in
show_confusion_matrixby @lshpaner in https://github.com/lshpaner/model_metrics/pull/10Fix Calibration Curve Grid Plot Behavior and Update Model Nomenclature by @lshpaner in https://github.com/lshpaner/model_metrics/pull/11
Improved support for multiple models and group categories in calibration curve by @lshpaner in https://github.com/lshpaner/model_metrics/pull/13
Upd.
plot_threshold_metricsw/ new lookup_kwgs and legend logic by @lshpaner in https://github.com/lshpaner/model_metrics/pull/14Rmv. unused arguments by @lshpaner in https://github.com/lshpaner/model_metrics/pull/15
Move PDF-related Functions from
eda_toolkittomodel_metricsby @lshpaner in https://github.com/lshpaner/model_metrics/pull/16
Full Changelog: https://github.com/lshpaner/model_metrics/compare/0.0.1a…0.0.2a
Version 0.0.1a
Updated unit tests and
READMEAdded
statsmodelsto library importsAdded coefficients and p-values to regression summary
Added regression capabilities to
summarize_model_performanceAdded lift and gains charts
Updated versions for earlier Python compatibility