Changelog

Version 0.0.5a3

Major Features Added

  • Axis Limits: Added xlim and ylim parameters for standardizing axis ranges across multiple model comparisons

  • Bottom Legend Support: Automatic figure height adjustment when legend_loc="bottom" to prevent legend overlap with x-axis labels

  • Multi-Model Layout: Improved default layout for multiple models with plot_type="all" - now arranges as one row per model (6 columns × N rows) instead of mixed layout

Bug Fixes

  • Heteroskedasticity Tests: Fixed categorical variable handling in all tests (Breusch-Pagan, White, Goldfeld-Quandt) by encoding categorical columns before running tests

  • Legend Formatting:

    • Fixed legend duplication bug in Scale-Location plot (removed test name prepending since interpretations already contain test names)

    • Fixed histogram legend to use apply_legend() for consistent formatting

    • Fixed legend kwargs not being properly passed to Scale-Location plot

  • Group Category Handling: Fixed KeyError when group_category column not in X DataFrame by properly checking column existence before filtering predictor columns

  • Index Alignment: Fixed AssertionError when using external group_category array by ensuring Series index matches X.index

  • Python 3.8 Compatibility: Fixed LaTeX rendering error in Scale-Location y-axis label by replacing \text{} with \mathrm{} for matplotlib <3.3 compatibility

Enhancements

  • Scale-Location Y-Axis: Changed to LaTeX notation r"$\sqrt{|\text{Std. Residuals}|}$" for better readability

  • Histogram Overlay: Removed normal distribution overlay from histogram_type="frequency" for cleaner, simpler visualization (overlay still present for histogram_type="density")

  • Text Wrapping: Added text_wrap parameter support to all subplot titles (previously only worked for suptitles)

  • Helper Functions:

    • Created apply_axis_limits() helper in plot_utils.py

    • Enhanced apply_legend() to handle bottom legend resizing with flag-based prevention of multiple resizes

  • Refactoring: Refactored plot_threshold_metrics() to use apply_plot_title() and apply_legend() helpers for consistency

Version 0.0.5a2

Testing Improvements

  • Added 150+ comprehensive unit tests covering:

    • Edge cases and error handling

    • Parameter validation

    • Different input pathways (model vs y_prob vs y_pred)

    • Group category functionality

    • Styling and customization options

    • Integration scenarios

Code Quality

  • Refactored monolithic 3866-line model_metrics.py into modular components:

    • model_evaluator.py - Main plotting and evaluation functions

    • metrics_utils.py - Utility functions and calculations

    • plot_utils.py - Plotting helper functions

  • Improved code maintainability and organization

  • Enhanced error messages and validation

  • Operating point visualization with two methods:

    • operating_point_method='youden' - Youden’s J statistic

    • operating_point_method='closest_topleft' - Closest to top-left corner

  • DeLong test support for AUC comparison between models via delong parameter

  • Legend ordering - Proper organization: AUC curves → Random Guess → Operating Points

  • Custom operating point styling via operating_point_kwgs

Residual Diagnostics Expansion

  • New plot types:

    • 'influence' - Influence plot with Cook’s distance bubbles

    • 'predictors' - Individual residual plots for each predictor

  • Heteroskedasticity testing with multiple methods:

    • 'breusch_pagan' - Breusch-Pagan test

    • 'white' - White’s test

    • 'goldfeld_quandt' - Goldfeld-Quandt test

    • 'spearman' - Spearman rank correlation

    • 'all' - Run all tests

  • LOWESS smoothing via show_lowess parameter

  • Centroid visualization with two modes:

    • User-defined groups via group_category

    • Automatic K-means clustering via n_clusters

  • Histogram types:

    • histogram_type='frequency' - Raw counts (default)

    • histogram_type='density' - Probability density with normal overlay

  • Diagnostics table - Comprehensive model diagnostics via show_diagnostics_table

  • Return diagnostics - Programmatic access via return_diagnostics=True

Group Category Support

  • All classification plots now support group_category parameter:

    • ROC curves with per-group AUC and counts

    • PR curves with per-group metrics

    • Calibration curves with per-group calibration

  • Residual diagnostics support group visualization with centroids

  • Summary performance supports grouped classification metrics

Gain Chart Enhancement

  • Gini coefficient calculation and display via show_gini parameter

  • Custom decimal places for Gini via decimal_places parameter

Legend Customization

  • Legend location now supports:

    • Standard matplotlib locations (‘best’, ‘upper right’, etc.)

    • 'bottom' - Places legend below plot (perfect for group categories)

  • Automatic legend ordering for better readability

summarize_model_performance

  • Added include_adjusted_r2 for regression models

  • Added group_category for grouped classification metrics

  • Added overall_only for regression to show only aggregate metrics

  • Improved coefficient ordering (intercept first)

  • Better handling of feature importances for tree-based models

show_confusion_matrix

  • Added show_colorbar parameter (default: False)

  • Added labels parameter to toggle TN/FP/FN/TP labels

  • Improved font size controls (inner_fontsize, label_fontsize, tick_fontsize)

show_roc_curve

  • Added show_operating_point and operating_point_method

  • Added operating_point_kwgs for custom styling

  • Added delong parameter for AUC comparison

  • Added group_category for stratified analysis

  • Added legend_loc parameter

show_pr_curve

  • Added legend_metric parameter (‘ap’ or ‘aucpr’)

  • Added group_category for stratified analysis

  • Added legend_loc parameter

show_calibration_curve

  • Added show_brier_score parameter (default: True)

  • Added brier_decimals for formatting

  • Added group_category for stratified analysis

  • Added legend_loc parameter

show_gain_chart

  • Added show_gini parameter (default: False)

  • Added decimal_places for Gini formatting

plot_threshold_metrics

  • Added lookup_metric and lookup_value for threshold optimization

  • Added model_threshold to highlight specific thresholds

  • Added baseline_thresh to toggle baseline line

  • Added custom styling: curve_kwgs, baseline_kwgs, threshold_kwgs, lookup_kwgs

show_residual_diagnostics

  • Added plot_type options: ‘all’, ‘fitted’, ‘qq’, ‘scale_location’, ‘leverage’, ‘influence’, ‘histogram’, ‘predictors’

  • Added heteroskedasticity_test with multiple test options

  • Added show_lowess for trend lines

  • Added lowess_kwgs for LOWESS styling

  • Added group_category for stratified analysis

  • Added group_kwgs for custom group styling

  • Added show_centroids and centroid_kwgs

  • Added centroid_type (‘clusters’ or ‘groups’)

  • Added n_clusters for automatic clustering

  • Added histogram_type (‘frequency’ or ‘density’)

  • Added show_diagnostics_table and return_diagnostics

  • Added show_plots to disable plotting

  • Added show_outliers and n_outliers for labeling

  • Added legend_loc parameter

  • Added legend_kwgs to control legend display for groups, centroids, clusters, and het_tests

  • Added kmeans_rstate for reproducible clustering

  • Added n_cols and n_rows for custom subplot layouts

  • Added point_kwgs for scatter point styling (supports edgecolor, linewidth, etc.)

Bug Fixes

  • Fixed confusion matrix colorbar removal when show_colorbar=False

  • Fixed duplicate text handling in confusion matrix displays

  • Fixed legend placement for grouped visualizations

  • Fixed text wrapping for long titles

  • Fixed LOWESS exception handling (now fails gracefully)

  • Fixed feature importance display for tree-based models

  • Fixed coefficient ordering in regression output

  • Fixed empty metric columns in regression feature importance rows

Documentation Improvements

  • Comprehensive docstrings for all major functions

  • Parameter descriptions with examples

  • Error message improvements for better debugging

  • Type hints and validation error messages

  • Usage examples in docstrings

Testing

  • Test suite expanded from ~50 tests to 152 tests

  • Coverage increased from 50% to 86% on core modules

  • All edge cases and error conditions tested

  • Integration tests for real-world workflows

  • Parametrized tests for systematic coverage

Performance

  • No performance regressions

  • Modular code structure improves maintainability

  • Efficient calculation caching where applicable

Migration Guide

From 0.0.5a1 to 0.0.5a2:

No changes required - all existing code will work as before. New features are opt-in:

Version 0.0.5a1

Added

  • Operating Point Visualization for ROC Curves: Added show_operating_point parameter to display optimal classification thresholds on ROC curves with two methods:

    • youden: Youden’s J statistic (maximizes TPR - FPR)

    • closest_topleft: Point closest to top-left corner (minimizes distance to perfect classifier)

    • Configurable via operating_point_method and operating_point_kwgs parameters

    • Operating points display threshold values in legends and appear as markers on curves

  • Gini Coefficient for Gain Charts: Added automatic calculation and display of Gini coefficient in show_gain_chart()

    • Prints Gini coefficient for each model (default: 3 decimal places)

    • Displays in legend labels across all plot modes (overlay, subplots, single)

    • Configurable via show_gini and decimal_places parameters

  • Legend Location Control: Added legend_loc parameter to all plotting functions for flexible legend positioning

    • Supports standard matplotlib locations ('lower right', 'upper left', 'best', etc.)

    • Special ‘bottom’ option places legend below plot with proper spacing

    • Available in: show_roc_curve(), show_pr_curve(), show_calibration_curve(), show_lift_chart(), show_gain_chart()

Improved

  • Legend Ordering for ROC Curves: Standardized legend entry order across all plot modes

    • Order: Model curves with AUC → Random Guess baseline → Operating points

    • Ensures consistent, intuitive legend presentation

  • Overlay Mode for ROC Curves: Enhanced operating point display in overlay plots

    • Combined AUC and operating point threshold in single legend entry

    • Format: “Model Name (AUC = 0.XX, Op = 0.XX)”

    • Operating point markers appear on curves without duplicate legend entries

Technical Details

  • Operating points calculated post-ROC curve generation using optimal threshold selection

  • Gini coefficient derived from area under gain curve: Gini = 2 × AUGC - 1

  • Legend positioning uses bbox_to_anchor for 'bottom' placement with dynamic spacing

  • All changes maintain backward compatibility with existing code

Version 0.0.4a10

Refactored and stabilized the summarize_model_performance function to improve consistency across classification and regression workflows while preserving the exact formatting logic for printed outputs and regression coefficient display.

Changes

  • Consolidated redundant metric computation into dedicated helper functions for classification and regression metrics.

  • Ensured regression coefficients, intercepts, and feature importances are retained and ordered correctly in the final DataFrame output.

  • Fixed grouped classification output so Model Threshold always appears last, and group headers correctly reflect category names.

  • Added conditional handling for grouped classification to prevent KeyError when the "Model" column is absent.

  • Preserved the original manual formatting block to maintain Leon’s custom printing logic for both classification and regression:

    • Right-aligned all table columns for readability.

    • Retained separator-based visual formatting and model-wise breaks.

    • Preserved coefficient and intercept reporting behavior exactly as before, ensuring regression results remain interpretable and consistent.

Impact

  • Classification and regression now produce stable, well-ordered, and readable summaries.

  • Grouped and non-grouped runs behave consistently without disrupting regression coefficient output.

  • Backward compatibility with previous console and DataFrame output formats maintained.

Version 0.0.4a9

This release introduces a new parameter, brier_decimals, to the show_calibration_curve() function, allowing users to control the number of decimal places displayed for the Brier score.

Changes Made

  • Added brier_decimals parameter (default: 3) next to show_brier_score.

  • Updated Brier score display logic to format using round(brier_score, brier_decimals).

  • Improved readability and precision consistency across calibration plots.

Impact

  • No breaking changes.

  • Users now have finer control over Brier score precision in calibration curve visualizations.

Quick Example

from model_metrics import show_calibration_curve
show_calibration_curve(model, X, y, show_brier_score=True, brier_decimals=4)

Version 0.0.4a8

Summary:

Updated hanley_mcneil_auc_test() function to perform a large-sample z-test for comparing correlated AUCs, based on Hanley & McNeil (1982), an analytical approximation of DeLong’s test.

Key Changes:

  • Implemented hanley_mcneil_auc_test() with parameters:

    • y_true, y_scores_1, y_scores_2 for AUC comparison.

    • Optional model_names, verbose, and return_values arguments for flexible use.

  • Added formatted, human-readable print output (when verbose=True).

  • Enabled optional programmatic access with return_values=True.

  • Adopted NumPy-style docstring for clarity and consistency.

  • Integrated helper into show_roc_curve() to enable AUC significance testing when the delong argument is provided.

Notes: This helper can also be used as a standalone function for independent AUC comparison between two models, outside of visualization workflows.

Version 0.0.4a7

  • DeLong’s test (Hanley & McNeil approximation)

    • Implemented a new helper function hanley_mcneil_auc_test() for approximate DeLong’s AUC comparison.

    • Integrated the helper inside show_roc_curve() to optionally print AUC differences and p-values between two models.

    • Added corresponding pytest coverage under test_show_roc_curve_with_delong().

  • Group category support

    • Added the group_category input to summarize_model_performance() to generate subgroup-level performance summaries.

    • Enables stratified metric reporting for fairness or demographic analysis.

Version 0.0.4a6

Reworded the print message inside plot_threshold_metrics() for clarity.

Old:

print(
      f"Best threshold for {lookup_metric} = "
      f"{round(lookup_value, decimal_places)} is: "
      f"{round(best_threshold, decimal_places)}"
)

New:

print(
    f"Best threshold for target {lookup_metric} of "
    f"{round(lookup_value, decimal_places)} is "
    f"{round(best_threshold, decimal_places)}"
)

This removes the equals sign and colon, and adds “target” for a smoother, more descriptive sentence.

Version 0.0.4a8

  • Added a minimal type check to ensure y_prob is always a list at the start of each affected function:

  • summarize_model_performance

  • show_calibration_curve

  • show_confusion_matrix

  • show_lift_chart

  • show_gain_chart

  • show_roc_curve

  • show_pr_curve

# Ensure y_prob is always a list of NumPy arrays
if isinstance(y_prob, np.ndarray):
   y_prob = [y_prob]

This allows y_prob[0] indexing to work whether the caller provides a single NumPy array or a list of arrays.

  • Updated unittests

Version 0.0.4a4

  • Corrected README to reflect the current version.

  • Previous release did not update the README properly because the file was not saved before publishing.

  • No functional changes to the library.

Version 0.0.4a3

  • Added missing scipy (>=1.8,<=1.14.0) requirement to the README.

Version 0.0.4a2

This version updates pyproject.toml and requirements.txt to restrict SciPy to >=1.8,<=1.14.0.

  • Prevents installation of scipy==1.14.1+, which removes _lazywhere and breaks statsmodels.

  • Keeps compatibility with model_tuner and Colab environments.

  • Bumps package version for release.

  • Updated scipy dependency to >=1.8,<=1.14.0

  • Synced requirements.txt with updated constraints

Version 0.0.4a1

  • Replaced the old grid parameter with subplots across plotting functions for consistency.

  • Standardized gridline handling by replacing unconditional plt.grid() calls with plt.grid(visible=gridlines)

Why

  • Aligns function signatures to use subplots consistently instead of grid.

  • Makes gridline visibility configurable through a single gridlines flag.

  • Cleaner charts when gridlines=False, no visual change when gridlines=True.

Version 0.0.4a

Summary

Added the ability to pass predicted probabilities (y_prob) directly into the functions in model_evaluator.py as an alternative to supplying a fitted model and feature matrix. This flexibility lets end users evaluate results in two ways:

  • Using a model object with X (current behavior)

  • Or passing y_prob directly (new option)

Details

  • Updated all relevant evaluator functions (summarize_model_performance, plot_threshold_metrics, etc.) to accept y_prob as input.

  • Added input validation: functions now check that either (model and X) or y_prob are provided, not both missing.

  • Preserved existing model-based workflows for backward compatibility.

  • Extended unit tests in unittests/ to cover the new probability-based path, including edge cases and validation errors.

Why

End users sometimes already have predicted probabilities from external pipelines or pre-computed experiments. This change avoids forcing them to re-supply the model, streamlining the evaluation process.

Version 0.0.3a

  • Added "plotly>=5.18.0, <=5.24.1" in pyproject.toml, setup.py, README_min.md –> for partial_dependence.py functions

Version 0.0.2a

Full Changelog: https://github.com/lshpaner/model_metrics/compare/0.0.1a…0.0.2a

Version 0.0.1a

  • Updated unit tests and README

  • Added statsmodels to library imports

  • Added coefficients and p-values to regression summary

  • Added regression capabilities to summarize_model_performance

  • Added lift and gains charts

  • Updated versions for earlier Python compatibility