Changelog

Version 0.0.5a6

3D Partial Dependence

Bug Fixes

Fixed interactive Plotly plot not rendering in Jupyter notebooks unless save_plots was set. Display and saving are now fully decoupled; the plot always renders regardless of save_plots.
Removed duplicate HTML save block that existed inside the static plot section.

New Features

Added x_label_map and y_label_map parameters for mapping raw axis values to human-readable tick labels; useful for encoded or numeric categorical features.
Added modebar_image_format parameter ("png", "svg", "jpeg", "webp") to control the download format of the Plotly modebar camera button. Defaults to "png".

Improvements

Docstring updated to document x_label_map, y_label_map, and modebar_image_format.
Raises section expanded to cover all ValueError conditions, including invalid save_plots, missing image paths, missing HTML paths, invalid plot_type, and invalid modebar_image_format.
Update to plot_3d_pdp docstring
Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a5

3D Partial Dependence

Update to plot_3d_pdp docstring
Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a4

3D Partial Dependence

Adds full categorical feature support to plot_3d_pdp while preserving

backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.

Version 0.0.5a3

Axis Limits: Added xlim and ylim parameters for standardizing axis ranges across multiple model comparisons
Bottom Legend Support: Automatic figure height adjustment when legend_loc="bottom" to prevent legend overlap with x-axis labels
Multi-Model Layout: Improved default layout for multiple models with plot_type="all" - now arranges as one row per model (6 columns × N rows) instead of mixed layout

Heteroskedasticity Tests: Fixed categorical variable handling in all tests (Breusch-Pagan, White, Goldfeld-Quandt) by encoding categorical columns before running tests
Legend Formatting:
- Fixed legend duplication bug in Scale-Location plot (removed test name prepending since interpretations already contain test names)
- Fixed histogram legend to use apply_legend() for consistent formatting
- Fixed legend kwargs not being properly passed to Scale-Location plot
Group Category Handling: Fixed KeyError when group_category column not in X DataFrame by properly checking column existence before filtering predictor columns
Index Alignment: Fixed AssertionError when using external group_category array by ensuring Series index matches X.index
Python 3.8 Compatibility: Fixed LaTeX rendering error in Scale-Location y-axis label by replacing \text{} with \mathrm{} for matplotlib <3.3 compatibility

Scale-Location Y-Axis: Changed to LaTeX notation r"$\sqrt{|\text{Std. Residuals}|}$" for better readability
Histogram Overlay: Removed normal distribution overlay from histogram_type="frequency" for cleaner, simpler visualization (overlay still present for histogram_type="density")
Text Wrapping: Added text_wrap parameter support to all subplot titles (previously only worked for suptitles)
Helper Functions:
- Created apply_axis_limits() helper in plot_utils.py
- Enhanced apply_legend() to handle bottom legend resizing with flag-based prevention of multiple resizes
Refactoring: Refactored plot_threshold_metrics() to use apply_plot_title() and apply_legend() helpers for consistency

Version 0.0.5a2

Added 150+ comprehensive unit tests covering:
- Edge cases and error handling
- Parameter validation
- Different input pathways (model vs y_prob vs y_pred)
- Group category functionality
- Styling and customization options
- Integration scenarios

Refactored monolithic 3866-line model_metrics.py into modular components:
- model_evaluator.py - Main plotting and evaluation functions
- metrics_utils.py - Utility functions and calculations
- plot_utils.py - Plotting helper functions
Improved code maintainability and organization
Enhanced error messages and validation

Operating point visualization with two methods:
- operating_point_method='youden' - Youden’s J statistic
- operating_point_method='closest_topleft' - Closest to top-left corner
DeLong test support for AUC comparison between models via delong parameter
Legend ordering - Proper organization: AUC curves → Random Guess → Operating Points
Custom operating point styling via operating_point_kwgs

New plot types:
- 'influence' - Influence plot with Cook’s distance bubbles
- 'predictors' - Individual residual plots for each predictor
Heteroskedasticity testing with multiple methods:
- 'breusch_pagan' - Breusch-Pagan test
- 'white' - White’s test
- 'goldfeld_quandt' - Goldfeld-Quandt test
- 'spearman' - Spearman rank correlation
- 'all' - Run all tests
LOWESS smoothing via show_lowess parameter
Centroid visualization with two modes:
- User-defined groups via group_category
- Automatic K-means clustering via n_clusters
Histogram types:
- histogram_type='frequency' - Raw counts (default)
- histogram_type='density' - Probability density with normal overlay
Diagnostics table - Comprehensive model diagnostics via show_diagnostics_table
Return diagnostics - Programmatic access via return_diagnostics=True

All classification plots now support group_category parameter:
- ROC curves with per-group AUC and counts
- PR curves with per-group metrics
- Calibration curves with per-group calibration
Residual diagnostics support group visualization with centroids
Summary performance supports grouped classification metrics

Gini coefficient calculation and display via show_gini parameter
Custom decimal places for Gini via decimal_places parameter

Legend location now supports:
- Standard matplotlib locations (‘best’, ‘upper right’, etc.)
- 'bottom' - Places legend below plot (perfect for group categories)
Automatic legend ordering for better readability

Added include_adjusted_r2 for regression models
Added group_category for grouped classification metrics
Added overall_only for regression to show only aggregate metrics
Improved coefficient ordering (intercept first)
Better handling of feature importances for tree-based models

Added show_colorbar parameter (default: False)
Added labels parameter to toggle TN/FP/FN/TP labels
Improved font size controls (inner_fontsize, label_fontsize, tick_fontsize)

Added show_operating_point and operating_point_method
Added operating_point_kwgs for custom styling
Added delong parameter for AUC comparison
Added group_category for stratified analysis
Added legend_loc parameter

Added legend_metric parameter (‘ap’ or ‘aucpr’)
Added group_category for stratified analysis
Added legend_loc parameter

Added show_brier_score parameter (default: True)
Added brier_decimals for formatting
Added group_category for stratified analysis
Added legend_loc parameter

Added show_gini parameter (default: False)
Added decimal_places for Gini formatting

Added lookup_metric and lookup_value for threshold optimization
Added model_threshold to highlight specific thresholds
Added baseline_thresh to toggle baseline line
Added custom styling: curve_kwgs, baseline_kwgs, threshold_kwgs, lookup_kwgs

Added plot_type options: ‘all’, ‘fitted’, ‘qq’, ‘scale_location’, ‘leverage’, ‘influence’, ‘histogram’, ‘predictors’
Added heteroskedasticity_test with multiple test options
Added show_lowess for trend lines
Added lowess_kwgs for LOWESS styling
Added group_category for stratified analysis
Added group_kwgs for custom group styling
Added show_centroids and centroid_kwgs
Added centroid_type (‘clusters’ or ‘groups’)
Added n_clusters for automatic clustering
Added histogram_type (‘frequency’ or ‘density’)
Added show_diagnostics_table and return_diagnostics
Added show_plots to disable plotting
Added show_outliers and n_outliers for labeling
Added legend_loc parameter
Added legend_kwgs to control legend display for groups, centroids, clusters, and het_tests
Added kmeans_rstate for reproducible clustering
Added n_cols and n_rows for custom subplot layouts
Added point_kwgs for scatter point styling (supports edgecolor, linewidth, etc.)

Fixed confusion matrix colorbar removal when show_colorbar=False
Fixed duplicate text handling in confusion matrix displays
Fixed legend placement for grouped visualizations
Fixed text wrapping for long titles
Fixed LOWESS exception handling (now fails gracefully)
Fixed feature importance display for tree-based models
Fixed coefficient ordering in regression output
Fixed empty metric columns in regression feature importance rows

Comprehensive docstrings for all major functions
Parameter descriptions with examples
Error message improvements for better debugging
Type hints and validation error messages
Usage examples in docstrings

Test suite expanded from ~50 tests to 152 tests
Coverage increased from 50% to 86% on core modules
All edge cases and error conditions tested
Integration tests for real-world workflows
Parametrized tests for systematic coverage

No performance regressions
Modular code structure improves maintainability
Efficient calculation caching where applicable

From 0.0.5a1 to 0.0.5a2:

No changes required - all existing code will work as before. New features are opt-in:

Version 0.0.5a1

Operating Point Visualization for ROC Curves: Added show_operating_point parameter to display optimal classification thresholds on ROC curves with two methods:
- youden: Youden’s J statistic (maximizes TPR - FPR)
- closest_topleft: Point closest to top-left corner (minimizes distance to perfect classifier)
- Configurable via operating_point_method and operating_point_kwgs parameters
- Operating points display threshold values in legends and appear as markers on curves
Gini Coefficient for Gain Charts: Added automatic calculation and display of Gini coefficient in show_gain_chart()
- Prints Gini coefficient for each model (default: 3 decimal places)
- Displays in legend labels across all plot modes (overlay, subplots, single)
- Configurable via show_gini and decimal_places parameters
Legend Location Control: Added legend_loc parameter to all plotting functions for flexible legend positioning
- Supports standard matplotlib locations ('lower right', 'upper left', 'best', etc.)
- Special ‘bottom’ option places legend below plot with proper spacing
- Available in: show_roc_curve(), show_pr_curve(), show_calibration_curve(), show_lift_chart(), show_gain_chart()

Legend Ordering for ROC Curves: Standardized legend entry order across all plot modes
- Order: Model curves with AUC → Random Guess baseline → Operating points
- Ensures consistent, intuitive legend presentation
Overlay Mode for ROC Curves: Enhanced operating point display in overlay plots
- Combined AUC and operating point threshold in single legend entry
- Format: “Model Name (AUC = 0.XX, Op = 0.XX)”
- Operating point markers appear on curves without duplicate legend entries

Operating points calculated post-ROC curve generation using optimal threshold selection
Gini coefficient derived from area under gain curve: Gini = 2 × AUGC - 1
Legend positioning uses bbox_to_anchor for 'bottom' placement with dynamic spacing
All changes maintain backward compatibility with existing code

Version 0.0.4a10

Refactored and stabilized the summarize_model_performance function to improve consistency across classification and regression workflows while preserving the exact formatting logic for printed outputs and regression coefficient display.

Consolidated redundant metric computation into dedicated helper functions for classification and regression metrics.
Ensured regression coefficients, intercepts, and feature importances are retained and ordered correctly in the final DataFrame output.
Fixed grouped classification output so Model Threshold always appears last, and group headers correctly reflect category names.
Added conditional handling for grouped classification to prevent KeyError when the "Model" column is absent.
Preserved the original manual formatting block to maintain Leon’s custom printing logic for both classification and regression:
- Right-aligned all table columns for readability.
- Retained separator-based visual formatting and model-wise breaks.
- Preserved coefficient and intercept reporting behavior exactly as before, ensuring regression results remain interpretable and consistent.

Classification and regression now produce stable, well-ordered, and readable summaries.
Grouped and non-grouped runs behave consistently without disrupting regression coefficient output.
Backward compatibility with previous console and DataFrame output formats maintained.

Version 0.0.4a9

This release introduces a new parameter, brier_decimals, to the show_calibration_curve() function, allowing users to control the number of decimal places displayed for the Brier score.

Added brier_decimals parameter (default: 3) next to show_brier_score.
Updated Brier score display logic to format using round(brier_score, brier_decimals).
Improved readability and precision consistency across calibration plots.

No breaking changes.
Users now have finer control over Brier score precision in calibration curve visualizations.

from model_metrics import show_calibration_curve
show_calibration_curve(model, X, y, show_brier_score=True, brier_decimals=4)

Version 0.0.4a8

Summary:

Updated hanley_mcneil_auc_test() function to perform a large-sample z-test for comparing correlated AUCs, based on Hanley & McNeil (1982), an analytical approximation of DeLong’s test.

Key Changes:

Implemented hanley_mcneil_auc_test() with parameters:
- y_true, y_scores_1, y_scores_2 for AUC comparison.
- Optional model_names, verbose, and return_values arguments for flexible use.
Added formatted, human-readable print output (when verbose=True).
Enabled optional programmatic access with return_values=True.
Adopted NumPy-style docstring for clarity and consistency.
Integrated helper into show_roc_curve() to enable AUC significance testing when the delong argument is provided.

Notes: This helper can also be used as a standalone function for independent AUC comparison between two models, outside of visualization workflows.

Version 0.0.4a7

DeLong’s test (Hanley & McNeil approximation)
- Implemented a new helper function hanley_mcneil_auc_test() for approximate DeLong’s AUC comparison.
- Integrated the helper inside show_roc_curve() to optionally print AUC differences and p-values between two models.
- Added corresponding pytest coverage under test_show_roc_curve_with_delong().
Group category support
- Added the group_category input to summarize_model_performance() to generate subgroup-level performance summaries.
- Enables stratified metric reporting for fairness or demographic analysis.

Version 0.0.4a6

Reworded the print message inside plot_threshold_metrics() for clarity.

Old:

print(
      f"Best threshold for {lookup_metric} = "
      f"{round(lookup_value, decimal_places)} is: "
      f"{round(best_threshold, decimal_places)}"
)

New:

print(
    f"Best threshold for target {lookup_metric} of "
    f"{round(lookup_value, decimal_places)} is "
    f"{round(best_threshold, decimal_places)}"
)

This removes the equals sign and colon, and adds “target” for a smoother, more descriptive sentence.

Version 0.0.4a8

Added a minimal type check to ensure y_prob is always a list at the start of each affected function:
summarize_model_performance
show_calibration_curve
show_confusion_matrix
show_lift_chart
show_gain_chart
show_roc_curve
show_pr_curve

# Ensure y_prob is always a list of NumPy arrays
if isinstance(y_prob, np.ndarray):
   y_prob = [y_prob]

This allows y_prob[0] indexing to work whether the caller provides a single NumPy array or a list of arrays.

Updated unittests

Version 0.0.4a4

Corrected README to reflect the current version.
Previous release did not update the README properly because the file was not saved before publishing.
No functional changes to the library.

Version 0.0.4a3

Added missing scipy (>=1.8,<=1.14.0) requirement to the README.

Version 0.0.4a2

This version updates pyproject.toml and requirements.txt to restrict SciPy to >=1.8,<=1.14.0.

Prevents installation of scipy==1.14.1+, which removes _lazywhere and breaks statsmodels.
Keeps compatibility with model_tuner and Colab environments.
Bumps package version for release.
Updated scipy dependency to >=1.8,<=1.14.0
Synced requirements.txt with updated constraints

Version 0.0.4a1

Replaced the old grid parameter with subplots across plotting functions for consistency.
Standardized gridline handling by replacing unconditional plt.grid() calls with plt.grid(visible=gridlines)

Aligns function signatures to use subplots consistently instead of grid.
Makes gridline visibility configurable through a single gridlines flag.
Cleaner charts when gridlines=False, no visual change when gridlines=True.

Version 0.0.4a

Added the ability to pass predicted probabilities (y_prob) directly into the functions in model_evaluator.py as an alternative to supplying a fitted model and feature matrix. This flexibility lets end users evaluate results in two ways:

Using a model object with X (current behavior)
Or passing y_prob directly (new option)

Updated all relevant evaluator functions (summarize_model_performance, plot_threshold_metrics, etc.) to accept y_prob as input.
Added input validation: functions now check that either (model and X) or y_prob are provided, not both missing.
Preserved existing model-based workflows for backward compatibility.
Extended unit tests in unittests/ to cover the new probability-based path, including edge cases and validation errors.

End users sometimes already have predicted probabilities from external pipelines or pre-computed experiments. This change avoids forcing them to re-supply the model, streamlining the evaluation process.

Version 0.0.3a

Added "plotly>=5.18.0, <=5.24.1" in pyproject.toml, setup.py, README_min.md –> for partial_dependence.py functions

Version 0.0.2a

Add show_ks_curve function and enhance summarize_model_performance by @lshpaner in https://github.com/lshpaner/model_metrics/pull/1
Add plot_threshold_metrics Function by @lshpaner in https://github.com/lshpaner/model_metrics/pull/2
Add pr_feature_plot and Update roc_feature_plot for Enhanced Visualization by @lshpaner in https://github.com/lshpaner/model_metrics/pull/3
Reg table enhance by @lshpaner in https://github.com/lshpaner/model_metrics/pull/4
Rmvd (%) from MAPE header by @lshpaner in https://github.com/lshpaner/model_metrics/pull/5
Moved roc legend to lower right default by @lshpaner in https://github.com/lshpaner/model_metrics/pull/6
Allow Flexible Inputs and Save Behavior for show_roc_curve() by @lshpaner in https://github.com/lshpaner/model_metrics/pull/7
Prcurve calc tests by @lshpaner in https://github.com/lshpaner/model_metrics/pull/8
Removed unused imports and functions by @lshpaner in https://github.com/lshpaner/model_metrics/pull/9
changed saving nomenclature in show_confusion_matrix by @lshpaner in https://github.com/lshpaner/model_metrics/pull/10
Fix Calibration Curve Grid Plot Behavior and Update Model Nomenclature by @lshpaner in https://github.com/lshpaner/model_metrics/pull/11
Improved support for multiple models and group categories in calibration curve by @lshpaner in https://github.com/lshpaner/model_metrics/pull/13
Upd. plot_threshold_metrics w/ new lookup_kwgs and legend logic by @lshpaner in https://github.com/lshpaner/model_metrics/pull/14
Rmv. unused arguments by @lshpaner in https://github.com/lshpaner/model_metrics/pull/15
Move PDF-related Functions from eda_toolkit to model_metrics by @lshpaner in https://github.com/lshpaner/model_metrics/pull/16

Full Changelog: https://github.com/lshpaner/model_metrics/compare/0.0.1a…0.0.2a

Version 0.0.1a

Updated unit tests and README
Added statsmodels to library imports
Added coefficients and p-values to regression summary
Added regression capabilities to summarize_model_performance
Added lift and gains charts
Updated versions for earlier Python compatibility