Changelog

Version 0.0.5a6

3D Partial Dependence

Bug Fixes

  • Fixed interactive Plotly plot not rendering in Jupyter notebooks unless save_plots was set. Display and saving are now fully decoupled; the plot always renders regardless of save_plots.

  • Removed duplicate HTML save block that existed inside the static plot section.

New Features

  • Added x_label_map and y_label_map parameters for mapping raw axis values to human-readable tick labels; useful for encoded or numeric categorical features.

  • Added modebar_image_format parameter ("png", "svg", "jpeg", "webp") to control the download format of the Plotly modebar camera button. Defaults to "png".

Improvements

  • Docstring updated to document x_label_map, y_label_map, and modebar_image_format.

  • Raises section expanded to cover all ValueError conditions, including invalid save_plots, missing image paths, missing HTML paths, invalid plot_type, and invalid modebar_image_format.

  • Update to plot_3d_pdp docstring

  • Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a5

3D Partial Dependence

  • Update to plot_3d_pdp docstring

  • Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a4

3D Partial Dependence

  • Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a3

  • Axis Limits: Added xlim and ylim parameters for standardizing axis ranges across multiple model comparisons

  • Bottom Legend Support: Automatic figure height adjustment when legend_loc="bottom" to prevent legend overlap with x-axis labels

  • Multi-Model Layout: Improved default layout for multiple models with plot_type="all" - now arranges as one row per model (6 columns × N rows) instead of mixed layout

  • Heteroskedasticity Tests: Fixed categorical variable handling in all tests (Breusch-Pagan, White, Goldfeld-Quandt) by encoding categorical columns before running tests

  • Legend Formatting:

    • Fixed legend duplication bug in Scale-Location plot (removed test name prepending since interpretations already contain test names)

    • Fixed histogram legend to use apply_legend() for consistent formatting

    • Fixed legend kwargs not being properly passed to Scale-Location plot

  • Group Category Handling: Fixed KeyError when group_category column not in X DataFrame by properly checking column existence before filtering predictor columns

  • Index Alignment: Fixed AssertionError when using external group_category array by ensuring Series index matches X.index

  • Python 3.8 Compatibility: Fixed LaTeX rendering error in Scale-Location y-axis label by replacing \text{} with \mathrm{} for matplotlib <3.3 compatibility

  • Scale-Location Y-Axis: Changed to LaTeX notation r"$\sqrt{|\text{Std. Residuals}|}$" for better readability

  • Histogram Overlay: Removed normal distribution overlay from histogram_type="frequency" for cleaner, simpler visualization (overlay still present for histogram_type="density")

  • Text Wrapping: Added text_wrap parameter support to all subplot titles (previously only worked for suptitles)

  • Helper Functions:

    • Created apply_axis_limits() helper in plot_utils.py

    • Enhanced apply_legend() to handle bottom legend resizing with flag-based prevention of multiple resizes

  • Refactoring: Refactored plot_threshold_metrics() to use apply_plot_title() and apply_legend() helpers for consistency

Version 0.0.5a2

  • Added 150+ comprehensive unit tests covering:

    • Edge cases and error handling

    • Parameter validation

    • Different input pathways (model vs y_prob vs y_pred)

    • Group category functionality

    • Styling and customization options

    • Integration scenarios

  • Refactored monolithic 3866-line model_metrics.py into modular components:

    • model_evaluator.py - Main plotting and evaluation functions

    • metrics_utils.py - Utility functions and calculations

    • plot_utils.py - Plotting helper functions

  • Improved code maintainability and organization

  • Enhanced error messages and validation

  • Operating point visualization with two methods:

    • operating_point_method='youden' - Youden’s J statistic

    • operating_point_method='closest_topleft' - Closest to top-left corner

  • DeLong test support for AUC comparison between models via delong parameter

  • Legend ordering - Proper organization: AUC curves → Random Guess → Operating Points

  • Custom operating point styling via operating_point_kwgs

  • New plot types:

    • 'influence' - Influence plot with Cook’s distance bubbles

    • 'predictors' - Individual residual plots for each predictor

  • Heteroskedasticity testing with multiple methods:

    • 'breusch_pagan' - Breusch-Pagan test

    • 'white' - White’s test

    • 'goldfeld_quandt' - Goldfeld-Quandt test

    • 'spearman' - Spearman rank correlation

    • 'all' - Run all tests

  • LOWESS smoothing via show_lowess parameter

  • Centroid visualization with two modes:

    • User-defined groups via group_category

    • Automatic K-means clustering via n_clusters

  • Histogram types:

    • histogram_type='frequency' - Raw counts (default)

    • histogram_type='density' - Probability density with normal overlay

  • Diagnostics table - Comprehensive model diagnostics via show_diagnostics_table

  • Return diagnostics - Programmatic access via return_diagnostics=True

  • All classification plots now support group_category parameter:

    • ROC curves with per-group AUC and counts

    • PR curves with per-group metrics

    • Calibration curves with per-group calibration

  • Residual diagnostics support group visualization with centroids

  • Summary performance supports grouped classification metrics

  • Gini coefficient calculation and display via show_gini parameter

  • Custom decimal places for Gini via decimal_places parameter

  • Legend location now supports:

    • Standard matplotlib locations (‘best’, ‘upper right’, etc.)

    • 'bottom' - Places legend below plot (perfect for group categories)

  • Automatic legend ordering for better readability

  • Added include_adjusted_r2 for regression models

  • Added group_category for grouped classification metrics

  • Added overall_only for regression to show only aggregate metrics

  • Improved coefficient ordering (intercept first)

  • Better handling of feature importances for tree-based models

  • Added show_colorbar parameter (default: False)

  • Added labels parameter to toggle TN/FP/FN/TP labels

  • Improved font size controls (inner_fontsize, label_fontsize, tick_fontsize)

  • Added show_operating_point and operating_point_method

  • Added operating_point_kwgs for custom styling

  • Added delong parameter for AUC comparison

  • Added group_category for stratified analysis

  • Added legend_loc parameter

  • Added legend_metric parameter (‘ap’ or ‘aucpr’)

  • Added group_category for stratified analysis

  • Added legend_loc parameter

  • Added show_brier_score parameter (default: True)

  • Added brier_decimals for formatting

  • Added group_category for stratified analysis

  • Added legend_loc parameter

  • Added show_gini parameter (default: False)

  • Added decimal_places for Gini formatting

  • Added lookup_metric and lookup_value for threshold optimization

  • Added model_threshold to highlight specific thresholds

  • Added baseline_thresh to toggle baseline line

  • Added custom styling: curve_kwgs, baseline_kwgs, threshold_kwgs, lookup_kwgs

  • Added plot_type options: ‘all’, ‘fitted’, ‘qq’, ‘scale_location’, ‘leverage’, ‘influence’, ‘histogram’, ‘predictors’

  • Added heteroskedasticity_test with multiple test options

  • Added show_lowess for trend lines

  • Added lowess_kwgs for LOWESS styling

  • Added group_category for stratified analysis

  • Added group_kwgs for custom group styling

  • Added show_centroids and centroid_kwgs

  • Added centroid_type (‘clusters’ or ‘groups’)

  • Added n_clusters for automatic clustering

  • Added histogram_type (‘frequency’ or ‘density’)

  • Added show_diagnostics_table and return_diagnostics

  • Added show_plots to disable plotting

  • Added show_outliers and n_outliers for labeling

  • Added legend_loc parameter

  • Added legend_kwgs to control legend display for groups, centroids, clusters, and het_tests

  • Added kmeans_rstate for reproducible clustering

  • Added n_cols and n_rows for custom subplot layouts

  • Added point_kwgs for scatter point styling (supports edgecolor, linewidth, etc.)

  • Fixed confusion matrix colorbar removal when show_colorbar=False

  • Fixed duplicate text handling in confusion matrix displays

  • Fixed legend placement for grouped visualizations

  • Fixed text wrapping for long titles

  • Fixed LOWESS exception handling (now fails gracefully)

  • Fixed feature importance display for tree-based models

  • Fixed coefficient ordering in regression output

  • Fixed empty metric columns in regression feature importance rows

  • Comprehensive docstrings for all major functions

  • Parameter descriptions with examples

  • Error message improvements for better debugging

  • Type hints and validation error messages

  • Usage examples in docstrings

  • Test suite expanded from ~50 tests to 152 tests

  • Coverage increased from 50% to 86% on core modules

  • All edge cases and error conditions tested

  • Integration tests for real-world workflows

  • Parametrized tests for systematic coverage

  • No performance regressions

  • Modular code structure improves maintainability

  • Efficient calculation caching where applicable

From 0.0.5a1 to 0.0.5a2:

No changes required - all existing code will work as before. New features are opt-in:

Version 0.0.5a1

  • Operating Point Visualization for ROC Curves: Added show_operating_point parameter to display optimal classification thresholds on ROC curves with two methods:

    • youden: Youden’s J statistic (maximizes TPR - FPR)

    • closest_topleft: Point closest to top-left corner (minimizes distance to perfect classifier)

    • Configurable via operating_point_method and operating_point_kwgs parameters

    • Operating points display threshold values in legends and appear as markers on curves

  • Gini Coefficient for Gain Charts: Added automatic calculation and display of Gini coefficient in show_gain_chart()

    • Prints Gini coefficient for each model (default: 3 decimal places)

    • Displays in legend labels across all plot modes (overlay, subplots, single)

    • Configurable via show_gini and decimal_places parameters

  • Legend Location Control: Added legend_loc parameter to all plotting functions for flexible legend positioning

    • Supports standard matplotlib locations ('lower right', 'upper left', 'best', etc.)

    • Special ‘bottom’ option places legend below plot with proper spacing

    • Available in: show_roc_curve(), show_pr_curve(), show_calibration_curve(), show_lift_chart(), show_gain_chart()

  • Legend Ordering for ROC Curves: Standardized legend entry order across all plot modes

    • Order: Model curves with AUC → Random Guess baseline → Operating points

    • Ensures consistent, intuitive legend presentation

  • Overlay Mode for ROC Curves: Enhanced operating point display in overlay plots

    • Combined AUC and operating point threshold in single legend entry

    • Format: “Model Name (AUC = 0.XX, Op = 0.XX)”

    • Operating point markers appear on curves without duplicate legend entries

  • Operating points calculated post-ROC curve generation using optimal threshold selection

  • Gini coefficient derived from area under gain curve: Gini = 2 × AUGC - 1

  • Legend positioning uses bbox_to_anchor for 'bottom' placement with dynamic spacing

  • All changes maintain backward compatibility with existing code

Version 0.0.4a10

Refactored and stabilized the summarize_model_performance function to improve consistency across classification and regression workflows while preserving the exact formatting logic for printed outputs and regression coefficient display.

  • Consolidated redundant metric computation into dedicated helper functions for classification and regression metrics.

  • Ensured regression coefficients, intercepts, and feature importances are retained and ordered correctly in the final DataFrame output.

  • Fixed grouped classification output so Model Threshold always appears last, and group headers correctly reflect category names.

  • Added conditional handling for grouped classification to prevent KeyError when the "Model" column is absent.

  • Preserved the original manual formatting block to maintain Leon’s custom printing logic for both classification and regression:

    • Right-aligned all table columns for readability.

    • Retained separator-based visual formatting and model-wise breaks.

    • Preserved coefficient and intercept reporting behavior exactly as before, ensuring regression results remain interpretable and consistent.

  • Classification and regression now produce stable, well-ordered, and readable summaries.

  • Grouped and non-grouped runs behave consistently without disrupting regression coefficient output.

  • Backward compatibility with previous console and DataFrame output formats maintained.

Version 0.0.4a9

This release introduces a new parameter, brier_decimals, to the show_calibration_curve() function, allowing users to control the number of decimal places displayed for the Brier score.

  • Added brier_decimals parameter (default: 3) next to show_brier_score.

  • Updated Brier score display logic to format using round(brier_score, brier_decimals).

  • Improved readability and precision consistency across calibration plots.

  • No breaking changes.

  • Users now have finer control over Brier score precision in calibration curve visualizations.

from model_metrics import show_calibration_curve
show_calibration_curve(model, X, y, show_brier_score=True, brier_decimals=4)

Version 0.0.4a8

Summary:

Updated hanley_mcneil_auc_test() function to perform a large-sample z-test for comparing correlated AUCs, based on Hanley & McNeil (1982), an analytical approximation of DeLong’s test.

Key Changes:

  • Implemented hanley_mcneil_auc_test() with parameters:

    • y_true, y_scores_1, y_scores_2 for AUC comparison.

    • Optional model_names, verbose, and return_values arguments for flexible use.

  • Added formatted, human-readable print output (when verbose=True).

  • Enabled optional programmatic access with return_values=True.

  • Adopted NumPy-style docstring for clarity and consistency.

  • Integrated helper into show_roc_curve() to enable AUC significance testing when the delong argument is provided.

Notes: This helper can also be used as a standalone function for independent AUC comparison between two models, outside of visualization workflows.

Version 0.0.4a7

  • DeLong’s test (Hanley & McNeil approximation)

    • Implemented a new helper function hanley_mcneil_auc_test() for approximate DeLong’s AUC comparison.

    • Integrated the helper inside show_roc_curve() to optionally print AUC differences and p-values between two models.

    • Added corresponding pytest coverage under test_show_roc_curve_with_delong().

  • Group category support

    • Added the group_category input to summarize_model_performance() to generate subgroup-level performance summaries.

    • Enables stratified metric reporting for fairness or demographic analysis.

Version 0.0.4a6

Reworded the print message inside plot_threshold_metrics() for clarity.

Old:

print(
      f"Best threshold for {lookup_metric} = "
      f"{round(lookup_value, decimal_places)} is: "
      f"{round(best_threshold, decimal_places)}"
)

New:

print(
    f"Best threshold for target {lookup_metric} of "
    f"{round(lookup_value, decimal_places)} is "
    f"{round(best_threshold, decimal_places)}"
)

This removes the equals sign and colon, and adds “target” for a smoother, more descriptive sentence.

Version 0.0.4a8

  • Added a minimal type check to ensure y_prob is always a list at the start of each affected function:

  • summarize_model_performance

  • show_calibration_curve

  • show_confusion_matrix

  • show_lift_chart

  • show_gain_chart

  • show_roc_curve

  • show_pr_curve

# Ensure y_prob is always a list of NumPy arrays
if isinstance(y_prob, np.ndarray):
   y_prob = [y_prob]

This allows y_prob[0] indexing to work whether the caller provides a single NumPy array or a list of arrays.

  • Updated unittests

Version 0.0.4a4

  • Corrected README to reflect the current version.

  • Previous release did not update the README properly because the file was not saved before publishing.

  • No functional changes to the library.

Version 0.0.4a3

  • Added missing scipy (>=1.8,<=1.14.0) requirement to the README.

Version 0.0.4a2

This version updates pyproject.toml and requirements.txt to restrict SciPy to >=1.8,<=1.14.0.

  • Prevents installation of scipy==1.14.1+, which removes _lazywhere and breaks statsmodels.

  • Keeps compatibility with model_tuner and Colab environments.

  • Bumps package version for release.

  • Updated scipy dependency to >=1.8,<=1.14.0

  • Synced requirements.txt with updated constraints

Version 0.0.4a1

  • Replaced the old grid parameter with subplots across plotting functions for consistency.

  • Standardized gridline handling by replacing unconditional plt.grid() calls with plt.grid(visible=gridlines)

  • Aligns function signatures to use subplots consistently instead of grid.

  • Makes gridline visibility configurable through a single gridlines flag.

  • Cleaner charts when gridlines=False, no visual change when gridlines=True.

Version 0.0.4a

Added the ability to pass predicted probabilities (y_prob) directly into the functions in model_evaluator.py as an alternative to supplying a fitted model and feature matrix. This flexibility lets end users evaluate results in two ways:

  • Using a model object with X (current behavior)

  • Or passing y_prob directly (new option)

  • Updated all relevant evaluator functions (summarize_model_performance, plot_threshold_metrics, etc.) to accept y_prob as input.

  • Added input validation: functions now check that either (model and X) or y_prob are provided, not both missing.

  • Preserved existing model-based workflows for backward compatibility.

  • Extended unit tests in unittests/ to cover the new probability-based path, including edge cases and validation errors.

End users sometimes already have predicted probabilities from external pipelines or pre-computed experiments. This change avoids forcing them to re-supply the model, streamlining the evaluation process.

Version 0.0.3a

  • Added "plotly>=5.18.0, <=5.24.1" in pyproject.toml, setup.py, README_min.md –> for partial_dependence.py functions

Version 0.0.2a

Full Changelog: https://github.com/lshpaner/model_metrics/compare/0.0.1a…0.0.2a

Version 0.0.1a

  • Updated unit tests and README

  • Added statsmodels to library imports

  • Added coefficients and p-values to regression summary

  • Added regression capabilities to summarize_model_performance

  • Added lift and gains charts

  • Updated versions for earlier Python compatibility