Changelog
Version 0.0.5a9
Important
Corrected package name in pyproject.toml from model_metrics_dev to model_metrics. The 0.0.5a8 release was inadvertently published under the wrong package name and could not be recalled in time; 0.0.5a9 supersedes it.
New Features
Added
ax=Noneparameter to seven single-plot functions:show_roc_curve,show_pr_curve,show_confusion_matrix,show_calibration_curve,show_lift_chart,show_gain_chart, andplot_threshold_metrics. When a pre-createdmatplotlib.axes.Axesobject is supplied, each function draws onto that axes and suppresses its internalplt.show()andsave_plot_imagescalls. All existing call signatures remain fully backward-compatible — the defaultax=Nonepreserves prior behavior exactly.Added
combine_plotsfunction that assembles multiple single-plot function calls into a shared subplot figure. Accepts a list of(func, kwargs)tuples, pre-allocates one axes per panel, and passes each axes to the corresponding function via theaxparameter. Supports configurable grid layout (n_cols,n_rows), customfigsize,suptitle,tight_layout, and fullsave_plot_imagesintegration. Unused trailing panels are hidden automatically. Panels that raise exceptions render an inline error message rather than aborting the figure.Added
tick_fontsizeparameter tocombine_plots. Bothlabel_fontsizeandtick_fontsizeare now automatically injected into each panel function viainspect.signature, giving uniform typography across the grid without per-panel repetition. Per-panel overrides inplot_callsalways take precedence.Added
hspaceandwspaceparameters tocombine_plotsfor explicit row and column spacing control. When either is supplied,constrained_layoutis activated and spacing is applied viafig.get_layout_engine().set()aftertight_layout, correctly handling panels withlegend_loc="bottom"that place legends outside the axes bbox.Added
height_ratiosparameter tocombine_plots, passed throughgridspec_kw, allowing individual rows to be resized independently. Useful when mixing plot types of different natural heights (e.g., confusion matrices alongside full-height curve panels).Fixed the overlay path in
show_roc_curve,show_pr_curve,show_lift_chart,show_gain_chart,show_calibration_curve, andplot_threshold_metrics. Previously,overlay=Truecalledplt.figure()unconditionally, creating a standalone figure instead of drawing onto the suppliedax. All six functions now checkax is not Nonebefore creating a figure, correctly routing overlay draws intocombine_plotspanel grids.
Bug Fixes
Fixed
combine_plotsaxes flattening logic. The previousif num_plots == 1guard incorrectly wrapped a numpy ndarray of axes in a list whenplt.subplots(1, N)was called withN > 1and only one plot call was provided, causing anAttributeError: 'numpy.ndarray' object has no attribute 'plot'. The fix inspects the axes object directly usinghasattrrather than relying onnum_plots.Fixed
show_confusion_matrixaxes reference conflict. The loop variableaxused in the subplots branch shadowed the user-suppliedaxparameter. The user-supplied value is now stashed as_user_axbefore the loop runs to prevent clobbering.Fixed
show_confusion_matrixignoringmodel_thresholdwhen passed as a list andXis provided. Previously the full list was forwarded toget_predictionswhich expects a scalar or dict, causing incorrect threshold application. The list is now indexed per model (model_threshold[idx]) in theX-provided branch.Fixed
show_confusion_matrixraisingTypeError: 'float' object is not subscriptablewhenmodel_thresholdwas passed as a scalar float in theX is Nonebranch. The branch now checksisinstancebefore indexing, correctly handling scalar, list, and dict threshold inputs.
Version 0.0.5a8
New Features
Added
ax=Noneparameter to seven single-plot functions:show_roc_curve,show_pr_curve,show_confusion_matrix,show_calibration_curve,show_lift_chart,show_gain_chart, andplot_threshold_metrics. When a pre-createdmatplotlib.axes.Axesobject is supplied, each function draws onto that axes and suppresses its internalplt.show()andsave_plot_imagescalls. All existing call signatures remain fully backward-compatible — the defaultax=Nonepreserves prior behavior exactly.Added
combine_plotsfunction that assembles multiple single-plot function calls into a shared subplot figure. Accepts a list of(func, kwargs)tuples, pre-allocates one axes per panel, and passes each axes to the corresponding function via theaxparameter. Supports configurable grid layout (n_cols,n_rows), customfigsize,suptitle,tight_layout, and fullsave_plot_imagesintegration. Unused trailing panels are hidden automatically. Panels that raise exceptions render an inline error message rather than aborting the figure.Added
tick_fontsizeparameter tocombine_plots. Bothlabel_fontsizeandtick_fontsizeare now automatically injected into each panel function viainspect.signature, giving uniform typography across the grid without per-panel repetition. Per-panel overrides inplot_callsalways take precedence.Added
hspaceandwspaceparameters tocombine_plotsfor explicit row and column spacing control. When either is supplied,constrained_layoutis activated and spacing is applied viafig.get_layout_engine().set()aftertight_layout, correctly handling panels withlegend_loc="bottom"that place legends outside the axes bbox.Added
height_ratiosparameter tocombine_plots, passed throughgridspec_kw, allowing individual rows to be resized independently. Useful when mixing plot types of different natural heights (e.g., confusion matrices alongside full-height curve panels).Fixed the overlay path in
show_roc_curve,show_pr_curve,show_lift_chart,show_gain_chart,show_calibration_curve, andplot_threshold_metrics. Previously,overlay=Truecalledplt.figure()unconditionally, creating a standalone figure instead of drawing onto the suppliedax. All six functions now checkax is not Nonebefore creating a figure, correctly routing overlay draws intocombine_plotspanel grids.
Bug Fixes
Fixed
combine_plotsaxes flattening logic. The previousif num_plots == 1guard incorrectly wrapped a numpy ndarray of axes in a list whenplt.subplots(1, N)was called withN > 1and only one plot call was provided, causing anAttributeError: 'numpy.ndarray' object has no attribute 'plot'. The fix inspects the axes object directly usinghasattrrather than relying onnum_plots.Fixed
show_confusion_matrixaxes reference conflict. The loop variableaxused in the subplots branch shadowed the user-suppliedaxparameter. The user-supplied value is now stashed as_user_axbefore the loop runs to prevent clobbering.Fixed
show_confusion_matrixignoringmodel_thresholdwhen passed as a list andXis provided. Previously the full list was forwarded toget_predictionswhich expects a scalar or dict, causing incorrect threshold application. The list is now indexed per model (model_threshold[idx]) in theX-provided branch.Fixed
show_confusion_matrixraisingTypeError: 'float' object is not subscriptablewhenmodel_thresholdwas passed as a scalar float in theX is Nonebranch. The branch now checksisinstancebefore indexing, correctly handling scalar, list, and dict threshold inputs.
Version 0.0.5a7
Summary
This version drops support for Python 3.7.4 and sets the minimum required Python version to 3.8. Python 3.7 reached end-of-life in June 2023 and is no longer supported by the library. Users on Python 3.7.x must upgrade before installing this version.
This version also delivers four major workstreams across the library: a
ground-up hardening of ModelCalculator, full multi-model support for
plot_threshold_metrics, a new image_filename parameter across all
eight plotting functions, and Python 3.8 compatibility restoration.
1. ModelCalculator
_extract_final_model
The original branching logic was ambiguous and failed silently for several real-world model wrapper patterns. The method now resolves wrappers in a strict, documented priority order:
Plain
dictwith a"model"key (e.g. model_tuner pkl format{"model": <Model>}) unwrapped first before any other check.sklearn.pipeline.Pipelineviahasattr(model, "steps"), extracting the last step.Objects with an
estimatorattribute (e.g. model_tunerModelobjects wrapping aCalibratedClassifierCV).Objects with a
modelattribute (e.g. custom wrapper classes).Standalone sklearn-compatible objects with
predict,predict_proba, ordecision_function.
The dict unwrap path was entirely absent before this change, causing
AttributeError: 'dict' object has no attribute 'predict' when loading
pkl files saved in model_tuner format.
generate_predictions
The prediction block previously called model.predict() and
model.predict_proba() directly on the raw object retrieved from
model_dict, bypassing _extract_final_model. This caused the same
AttributeError on dict-wrapped models. The block now routes through
the already-resolved estimator variable for all prediction calls,
while retaining the model.threshold check on the original object since
that attribute lives on the model_tuner wrapper, not the inner estimator.
_add_metrics
Replaced type(y_test_m) == pd.DataFrame with
isinstance(y_test_m, pd.DataFrame) per Python best practices. Also
corrected squeeze(axis=0) to squeeze(axis=1), which is the correct
axis for collapsing a single-column DataFrame into a Series.
_get_shap_explainer (new helper)
Replaced the generic shap.Explainer auto-detection call, which fired
internal probe warnings on every invocation, with an explicit helper that
selects the correct explainer class based on model attributes:
shap.TreeExplainerfor tree models (tree_orestimators_).shap.LinearExplainerfor linear models (coef_).shap.KernelExplainervia_make_predict_proba_wrapperfor everything else.
A guard at the top raises ValueError immediately for models without
predict_proba so unsupported models fail with a clear message rather
than crashing inside SHAP internals.
_make_predict_proba_wrapper (new helper)
KernelExplainer internally converts DataFrames to numpy before calling
the model function, which caused StandardScaler (fitted with named
columns) to emit UserWarning: X does not have valid feature names on
every SHAP call. The new wrapper re-attaches the original column names
before passing the array to predict_proba, eliminating the warning at
the source rather than suppressing it.
_calculate_shap_values
Global SHAP previously iterated row-by-row via itertuples, calling
the explainer once per sample. This has been replaced with a single
batched explainer(X_transformed) call, which is orders of magnitude
faster on datasets of any meaningful size.
Fixed the multi-class SHAP averaging from
.mean(axis=0).mean(axis=1) to .mean(axis=2).mean(axis=0), which
is the correct reduction order for a
(n_samples, n_features, n_classes) tensor.
Added pipeline unwrapping at the top of the method so the method handles dict/Pipeline-wrapped models passed directly rather than only pre-unwrapped estimators.
The include_contributions row-wise path now consistently returns top-N
{feature: shap_value} dicts sorted by absolute value, matching the
coefficient path. Previously it returned a flat dict over all features
regardless of top_n.
_calculate_coefficients
Added a secondary Pipeline unwrap after _extract_final_model for
cases where the extracted model is itself a Pipeline (e.g. model_tuner
objects where .estimator is a CalibratedClassifierCV wrapping a
Pipeline). Without this, coef_ lookup failed on the Pipeline
object rather than its final step.
The include_contributions=False path now returns top-N feature-name
lists rather than dicts, making it symmetric with the SHAP path and
ensuring subset_results column content is consistent regardless of
which explainability method is used.
Note
This is a behavior change for any code consuming the default output of
_calculate_coefficients as dicts. Pass
include_contributions=True to restore the previous dict output.
2. plot_threshold_metrics
Multi-model support
The function previously accepted only a single model or y_prob array.
It now accepts lists of models, y_prob arrays, and thresholds and
supports three display modes:
Single: one plot per model, unchanged from previous behavior.
Overlay: all models on a single shared axes, with curve labels prefixed by model name for disambiguation.
Subplots: one subplot per model in an auto-sized grid.
New parameters: model_title, overlay, subplots, n_cols,
n_rows, suptitle, suptitle_y, and model_threshold (now
accepts a list).
Model title defaulting
When model_title=None and model objects are provided, titles now
default to the model class name via extract_model_name() rather than
generic “Model 1”, “Model 2” index labels. When only y_prob arrays
are provided the index fallback is retained since there is nothing to
extract a name from.
n_rows / n_cols auto-derivation
When n_rows is explicitly provided but n_cols is left at its
default of 2, n_cols is now automatically derived as
ceil(num_models / n_rows). Previously specifying n_rows=1 with 3
models still produced a 2-column grid because n_cols was never
recalculated.
y_test flattening
precision_recall_curve and roc_curve inside _plot_single now
receive np.asarray(y_test).ravel() to prevent
ValueError: Found input variables with inconsistent numbers of samples
when y_test is a single-column DataFrame loaded from parquet.
suptitle / title separation
suptitle controls the overall figure heading above all subplots.
title controls per-subplot headings. The two can be set
independently: passing title="" suppresses per-subplot titles while
still showing the suptitle, and vice versa. Previously there was no way
to have both levels of titling independently.
3. image_filename Save Integration
Updated save_plot_images
Added three new parameters: image_filename, fig, and dpi.
Saving is now triggered when either save_plot=True or
image_filename is provided, so callers no longer need to set
save_plot=True just to use a custom filename. When image_filename
is provided it takes precedence over the auto-generated filename. The
function now calls fig.savefig() targeting the correct figure object
rather than plt.savefig(), which targeted whatever the current active
figure happened to be at call time.
All eight plotting functions updated
image_filename=None added to the signature immediately after
image_path_svg and threaded through every save_plot_images call
site across show_confusion_matrix, show_roc_curve,
show_pr_curve, show_lift_chart, show_gain_chart,
show_calibration_curve, plot_threshold_metrics, and
show_residual_diagnostics (22 call sites in total).
The if save_plot: guards that previously wrapped save_plot_images
calls in show_calibration_curve (group path) and
show_residual_diagnostics (both save sites) have been removed since
save_plot_images now handles the trigger logic internally.
All eight function docstrings updated to document image_filename
immediately after image_path_svg.
4. Python 3.8 Compatibility and Version Floor Change
The minimum supported Python version has been raised from 3.7.4 to
3.8. This aligns with the broader Python ecosystem where 3.7 has been
end-of-life since June 2023 and several upstream dependencies no longer
ship 3.7-compatible wheels.
fastparquet replaced with pyarrow
fastparquet fails to build on Python 3.8 due to a Cython ndarray
type identifier incompatibility that affects all released versions. The
dependency has been replaced with pyarrow>=11.0.0,<=14.0.2, which is
the last release with official Python 3.8 wheels. pd.read_parquet()
uses pyarrow automatically so no code changes were required.
pip upgrade required for Python 3.8 environment
The venv_3_8 environment shipped with pip 19.2.3 (2019), which
cannot parse modern pyproject.toml files used by packages like
ninja and scipy. This caused cascading build failures for
scikit-learn and pyarrow. Running
pip install --upgrade pip before
pip install -r requirements.txt unblocks the full install.
typing imports added to plot_utils.py
Optional, List, Dict, Union, and Tuple were used in
type annotations but not imported. Python 3.10+ allows X | None union
syntax without importing from typing but Python 3.8 requires the
explicit import. Added
from typing import Optional, List, Dict, Union, Tuple to resolve
NameError: name 'Optional' is not defined on import.
5. Test Suite
pytest collection conflict resolved
Both py_scripts/test_model_calculator.py and
unittests/test_model_calculator.py share the same basename, causing
pytest to fail at collection with an import file mismatch error.
Fixed by adding __init__.py to both py_scripts/ and
unittests/ and clearing stale __pycache__ artifacts.
test_model_calculator fixes
test_calculate_coefficients: updated assertion from
isinstance(row, dict) to isinstance(row, list) since the default
path now returns feature-name lists, not dicts.
test_extract_final_model_wrapped_model: resolved by adding the
hasattr(model, "model") branch to _extract_final_model. No test
change needed.
test_calculate_shap_unsupported_model: resolved by adding the
predict_proba guard at the top of _get_shap_explainer so
unsupported models raise ValueError before reaching
KernelExplainer.
test_calculate_shap_unexpected_shape and
test_rowwise_shap_output_unexpected_type: both monkeypatched
shap.Explainer which is no longer called. Updated to monkeypatch
ModelCalculator._get_shap_explainer instead.
test_model_evaluator fixes
test_plot_threshold_metrics_with_lookup and
test_plot_threshold_metrics_all_lookup_metrics: both asserted
"Best threshold" in captured.out. The print format now prefixes the
model name (e.g. "LogisticRegression -- best threshold for...").
Updated assertions to "best threshold" in captured.out which matches
both the old and new format.
Final result: 204 collected, 204 passed. Coverage: metrics_utils 79%,
model_calculator 87%, model_evaluator 83%,
partial_dependence 89%, plot_utils 70%.
6. Test and Usage Scripts
test_model_calculator.py (py_scripts)
Standalone .py equivalent of the test notebook with section headers,
coloured PASS/FAIL labels, and .to_string() DataFrame output for
clean terminal readability. Paths are anchored to __file__ via
SCRIPT_DIR and PROJECT_ROOT so the script runs correctly
regardless of which directory python is invoked from.
test_model_calculator.ipynb
Updated to use load_breast_cancer() (30 real named features) instead
of make_classification() (generic feature_0..9 labels) so SHAP
and coefficient outputs show meaningful feature names. Added
max_iter=10000 to LogisticRegression to suppress the convergence
warning that cluttered notebook output.
Version 0.0.5a6
Bug Fixes
Fixed interactive Plotly plot not rendering in Jupyter notebooks unless
save_plotswas set. Display and saving are now fully decoupled; the plot always renders regardless ofsave_plots.Removed duplicate HTML save block that existed inside the static plot section.
New Features
Added
x_label_mapandy_label_mapparameters for mapping raw axis values to human-readable tick labels; useful for encoded or numeric categorical features.Added
modebar_image_formatparameter ("png","svg","jpeg","webp") to control the download format of the Plotly modebar camera button. Defaults to"png".
Improvements
Docstring updated to document
x_label_map,y_label_map, andmodebar_image_format.Raisessection expanded to cover allValueErrorconditions, including invalidsave_plots, missing image paths, missing HTML paths, invalidplot_type, and invalidmodebar_image_format.Update to
plot_3d_pdpdocstringAdds full categorical feature support to
plot_3d_pdpwhile preserving
backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.
Version 0.0.5a5
Update to
plot_3d_pdpdocstringAdds full categorical feature support to
plot_3d_pdpwhile preserving
backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.
Version 0.0.5a4
Adds full categorical feature support to
plot_3d_pdpwhile preserving
backward compatibility with numeric grids. The function now renders categorical axes correctly in both Matplotlib and Plotly by mapping categories to numeric surface positions and overlaying labels. Custom label mapping is supported for cleaner presentation. Interactive hover and axis ticks now display true category names. HTML export logic was refactored to prevent duplicate writes and ensure reliable saving across plot modes.
Version 0.0.5a3
Major Features Added
Axis Limits: Added
xlimandylimparameters for standardizing axis ranges across multiple model comparisonsBottom Legend Support: Automatic figure height adjustment when
legend_loc="bottom"to prevent legend overlap with x-axis labelsMulti-Model Layout: Improved default layout for multiple models with
plot_type="all"- now arranges as one row per model (6 columns × N rows) instead of mixed layout
Bug Fixes
Heteroskedasticity Tests: Fixed categorical variable handling in all tests (Breusch-Pagan, White, Goldfeld-Quandt) by encoding categorical columns before running tests
Legend Formatting:
Fixed legend duplication bug in Scale-Location plot (removed test name prepending since interpretations already contain test names)
Fixed histogram legend to use
apply_legend()for consistent formattingFixed legend kwargs not being properly passed to Scale-Location plot
Group Category Handling: Fixed KeyError when
group_categorycolumn not inXDataFrame by properly checking column existence before filtering predictor columnsIndex Alignment: Fixed AssertionError when using external
group_categoryarray by ensuring Series index matchesX.indexPython 3.8 Compatibility: Fixed LaTeX rendering error in Scale-Location y-axis label by replacing
\text{}with\mathrm{}for matplotlib <3.3 compatibility
Enhancements
Scale-Location Y-Axis: Changed to LaTeX notation
r"$\sqrt{|\text{Std. Residuals}|}$"for better readabilityHistogram Overlay: Removed normal distribution overlay from
histogram_type="frequency"for cleaner, simpler visualization (overlay still present forhistogram_type="density")Text Wrapping: Added
text_wrapparameter support to all subplot titles (previously only worked for suptitles)Helper Functions:
Created
apply_axis_limits()helper inplot_utils.pyEnhanced
apply_legend()to handle bottom legend resizing with flag-based prevention of multiple resizes
Refactoring: Refactored
plot_threshold_metrics()to useapply_plot_title()andapply_legend()helpers for consistency
Version 0.0.5a2
Testing Improvements
Added 150+ comprehensive unit tests covering:
Edge cases and error handling
Parameter validation
Different input pathways (model vs y_prob vs y_pred)
Group category functionality
Styling and customization options
Integration scenarios
Code Quality
Refactored monolithic 3866-line
model_metrics.pyinto modular components:model_evaluator.py- Main plotting and evaluation functionsmetrics_utils.py- Utility functions and calculationsplot_utils.py- Plotting helper functions
Improved code maintainability and organization
Enhanced error messages and validation
Operating point visualization with two methods:
operating_point_method='youden'- Youden’s J statisticoperating_point_method='closest_topleft'- Closest to top-left corner
DeLong test support for AUC comparison between models via
delongparameterLegend ordering - Proper organization: AUC curves → Random Guess → Operating Points
Custom operating point styling via
operating_point_kwgs
Residual Diagnostics Expansion
New plot types:
'influence'- Influence plot with Cook’s distance bubbles'predictors'- Individual residual plots for each predictor
Heteroskedasticity testing with multiple methods:
'breusch_pagan'- Breusch-Pagan test'white'- White’s test'goldfeld_quandt'- Goldfeld-Quandt test'spearman'- Spearman rank correlation'all'- Run all tests
LOWESS smoothing via
show_lowessparameterCentroid visualization with two modes:
User-defined groups via
group_categoryAutomatic K-means clustering via
n_clusters
Histogram types:
histogram_type='frequency'- Raw counts (default)histogram_type='density'- Probability density with normal overlay
Diagnostics table - Comprehensive model diagnostics via
show_diagnostics_tableReturn diagnostics - Programmatic access via
return_diagnostics=True
Group Category Support
All classification plots now support
group_categoryparameter:ROC curves with per-group AUC and counts
PR curves with per-group metrics
Calibration curves with per-group calibration
Residual diagnostics support group visualization with centroids
Summary performance supports grouped classification metrics
Gain Chart Enhancement
Gini coefficient calculation and display via
show_giniparameterCustom decimal places for Gini via
decimal_placesparameter
Legend Customization
Legend location now supports:
Standard matplotlib locations (‘best’, ‘upper right’, etc.)
'bottom'- Places legend below plot (perfect for group categories)
Automatic legend ordering for better readability
summarize_model_performance
Added
include_adjusted_r2for regression modelsAdded
group_categoryfor grouped classification metricsAdded
overall_onlyfor regression to show only aggregate metricsImproved coefficient ordering (intercept first)
Better handling of feature importances for tree-based models
show_confusion_matrix
Added
show_colorbarparameter (default: False)Added
labelsparameter to toggle TN/FP/FN/TP labelsImproved font size controls (
inner_fontsize,label_fontsize,tick_fontsize)
show_roc_curve
Added
show_operating_pointandoperating_point_methodAdded
operating_point_kwgsfor custom stylingAdded
delongparameter for AUC comparisonAdded
group_categoryfor stratified analysisAdded
legend_locparameter
show_pr_curve
Added
legend_metricparameter (‘ap’ or ‘aucpr’)Added
group_categoryfor stratified analysisAdded
legend_locparameter
show_calibration_curve
Added
show_brier_scoreparameter (default: True)Added
brier_decimalsfor formattingAdded
group_categoryfor stratified analysisAdded
legend_locparameter
show_gain_chart
Added
show_giniparameter (default: False)Added
decimal_placesfor Gini formatting
plot_threshold_metrics
Added
lookup_metricandlookup_valuefor threshold optimizationAdded
model_thresholdto highlight specific thresholdsAdded
baseline_threshto toggle baseline lineAdded custom styling:
curve_kwgs,baseline_kwgs,threshold_kwgs,lookup_kwgs
show_residual_diagnostics
Added
plot_typeoptions: ‘all’, ‘fitted’, ‘qq’, ‘scale_location’, ‘leverage’, ‘influence’, ‘histogram’, ‘predictors’Added
heteroskedasticity_testwith multiple test optionsAdded
show_lowessfor trend linesAdded
lowess_kwgsfor LOWESS stylingAdded
group_categoryfor stratified analysisAdded
group_kwgsfor custom group stylingAdded
show_centroidsandcentroid_kwgsAdded
centroid_type(‘clusters’ or ‘groups’)Added
n_clustersfor automatic clusteringAdded
histogram_type(‘frequency’ or ‘density’)Added
show_diagnostics_tableandreturn_diagnosticsAdded
show_plotsto disable plottingAdded
show_outliersandn_outliersfor labelingAdded
legend_locparameterAdded
legend_kwgsto control legend display for groups, centroids, clusters, and het_testsAdded
kmeans_rstatefor reproducible clusteringAdded
n_colsandn_rowsfor custom subplot layoutsAdded
point_kwgsfor scatter point styling (supportsedgecolor,linewidth, etc.)
Bug Fixes
Fixed confusion matrix colorbar removal when
show_colorbar=FalseFixed duplicate text handling in confusion matrix displays
Fixed legend placement for grouped visualizations
Fixed text wrapping for long titles
Fixed LOWESS exception handling (now fails gracefully)
Fixed feature importance display for tree-based models
Fixed coefficient ordering in regression output
Fixed empty metric columns in regression feature importance rows
Documentation Improvements
Comprehensive docstrings for all major functions
Parameter descriptions with examples
Error message improvements for better debugging
Type hints and validation error messages
Usage examples in docstrings
Testing
Test suite expanded from ~50 tests to 152 tests
Coverage increased from 50% to 86% on core modules
All edge cases and error conditions tested
Integration tests for real-world workflows
Parametrized tests for systematic coverage
Performance
No performance regressions
Modular code structure improves maintainability
Efficient calculation caching where applicable
Migration Guide
From 0.0.5a1 to 0.0.5a2:
No changes required - all existing code will work as before. New features are opt-in:
Version 0.0.5a1
Added
Operating Point Visualization for ROC Curves: Added
show_operating_pointparameter to display optimal classification thresholds on ROC curves with two methods:youden: Youden’s J statistic (maximizes TPR - FPR)closest_topleft: Point closest to top-left corner (minimizes distance to perfect classifier)Configurable via
operating_point_methodandoperating_point_kwgsparametersOperating points display threshold values in legends and appear as markers on curves
Gini Coefficient for Gain Charts: Added automatic calculation and display of Gini coefficient in
show_gain_chart()Prints Gini coefficient for each model (default: 3 decimal places)
Displays in legend labels across all plot modes (overlay, subplots, single)
Configurable via
show_ginianddecimal_placesparameters
Legend Location Control: Added
legend_locparameter to all plotting functions for flexible legend positioningSupports standard matplotlib locations (
'lower right','upper left','best', etc.)Special ‘bottom’ option places legend below plot with proper spacing
Available in:
show_roc_curve(),show_pr_curve(),show_calibration_curve(),show_lift_chart(),show_gain_chart()
Improved
Legend Ordering for ROC Curves: Standardized legend entry order across all plot modes
Order: Model curves with AUC → Random Guess baseline → Operating points
Ensures consistent, intuitive legend presentation
Overlay Mode for ROC Curves: Enhanced operating point display in overlay plots
Combined AUC and operating point threshold in single legend entry
Format: “Model Name (AUC = 0.XX, Op = 0.XX)”
Operating point markers appear on curves without duplicate legend entries
Technical Details
Operating points calculated post-ROC curve generation using optimal threshold selection
Gini coefficient derived from area under gain curve:
Gini = 2 × AUGC - 1Legend positioning uses
bbox_to_anchorfor'bottom'placement with dynamic spacingAll changes maintain backward compatibility with existing code
Version 0.0.4a10
Refactored and stabilized the summarize_model_performance function to improve consistency across classification and regression workflows while preserving the exact formatting logic for printed outputs and regression coefficient display.
Changes
Consolidated redundant metric computation into dedicated helper functions for classification and regression metrics.
Ensured regression coefficients, intercepts, and feature importances are retained and ordered correctly in the final DataFrame output.
Fixed grouped classification output so Model Threshold always appears last, and group headers correctly reflect category names.
Added conditional handling for grouped classification to prevent
KeyErrorwhen the"Model"column is absent.Preserved the original manual formatting block to maintain Leon’s custom printing logic for both classification and regression:
Right-aligned all table columns for readability.
Retained separator-based visual formatting and model-wise breaks.
Preserved coefficient and intercept reporting behavior exactly as before, ensuring regression results remain interpretable and consistent.
Impact
Classification and regression now produce stable, well-ordered, and readable summaries.
Grouped and non-grouped runs behave consistently without disrupting regression coefficient output.
Backward compatibility with previous console and DataFrame output formats maintained.
Version 0.0.4a9
This release introduces a new parameter, brier_decimals, to the show_calibration_curve() function, allowing users to control the number of decimal places displayed for the Brier score.
Changes Made
Added
brier_decimalsparameter (default:3) next toshow_brier_score.Updated Brier score display logic to format using
round(brier_score, brier_decimals).Improved readability and precision consistency across calibration plots.
Impact
No breaking changes.
Users now have finer control over Brier score precision in calibration curve visualizations.
Quick Example
from model_metrics import show_calibration_curve
show_calibration_curve(model, X, y, show_brier_score=True, brier_decimals=4)
Version 0.0.4a8
Summary:
Updated hanley_mcneil_auc_test() function to perform a large-sample z-test for comparing correlated AUCs, based on Hanley & McNeil (1982), an analytical approximation of DeLong’s test.
Key Changes:
Implemented
hanley_mcneil_auc_test()with parameters:y_true,y_scores_1,y_scores_2for AUC comparison.Optional
model_names,verbose, andreturn_valuesarguments for flexible use.
Added formatted, human-readable print output (when
verbose=True).Enabled optional programmatic access with
return_values=True.Adopted NumPy-style docstring for clarity and consistency.
Integrated helper into
show_roc_curve()to enable AUC significance testing when thedelongargument is provided.
Notes: This helper can also be used as a standalone function for independent AUC comparison between two models, outside of visualization workflows.
Version 0.0.4a7
DeLong’s test (Hanley & McNeil approximation)
Implemented a new helper function
hanley_mcneil_auc_test()for approximate DeLong’s AUC comparison.Integrated the helper inside
show_roc_curve()to optionally print AUC differences and p-values between two models.Added corresponding pytest coverage under
test_show_roc_curve_with_delong().
Group category support
Added the
group_categoryinput tosummarize_model_performance()to generate subgroup-level performance summaries.Enables stratified metric reporting for fairness or demographic analysis.
Version 0.0.4a6
Reworded the print message inside plot_threshold_metrics() for clarity.
Old:
print(
f"Best threshold for {lookup_metric} = "
f"{round(lookup_value, decimal_places)} is: "
f"{round(best_threshold, decimal_places)}"
)
New:
print(
f"Best threshold for target {lookup_metric} of "
f"{round(lookup_value, decimal_places)} is "
f"{round(best_threshold, decimal_places)}"
)
This removes the equals sign and colon, and adds “target” for a smoother, more descriptive sentence.
Version 0.0.4a8
Added a minimal type check to ensure
y_probis always a list at the start of each affected function:summarize_model_performanceshow_calibration_curveshow_confusion_matrixshow_lift_chartshow_gain_chartshow_roc_curveshow_pr_curve
# Ensure y_prob is always a list of NumPy arrays
if isinstance(y_prob, np.ndarray):
y_prob = [y_prob]
This allows y_prob[0] indexing to work whether the caller provides a single
NumPy array or a list of arrays.
Updated unittests
Version 0.0.4a4
Corrected README to reflect the current version.
Previous release did not update the README properly because the file was not saved before publishing.
No functional changes to the library.
Version 0.0.4a3
Added missing
scipy (>=1.8,<=1.14.0)requirement to the README.
Version 0.0.4a2
This version updates pyproject.toml and requirements.txt to restrict SciPy to >=1.8,<=1.14.0.
Prevents installation of
scipy==1.14.1+, which removes_lazywhereand breaksstatsmodels.Keeps compatibility with
model_tunerand Colab environments.Bumps package version for release.
Updated
scipydependency to>=1.8,<=1.14.0Synced
requirements.txtwith updated constraints
Version 0.0.4a1
Replaced the old
gridparameter withsubplotsacross plotting functions for consistency.Standardized gridline handling by replacing unconditional
plt.grid()calls withplt.grid(visible=gridlines)
Why
Aligns function signatures to use subplots consistently instead of grid.
Makes gridline visibility configurable through a single gridlines flag.
Cleaner charts when gridlines=False, no visual change when gridlines=True.
Version 0.0.4a
Summary
Added the ability to pass predicted probabilities (y_prob) directly into
the functions in model_evaluator.py as an alternative to supplying a fitted
model and feature matrix. This flexibility lets end users evaluate results in two ways:
Using a model object with
X(current behavior)Or passing
y_probdirectly (new option)
Details
Updated all relevant evaluator functions (
summarize_model_performance,plot_threshold_metrics, etc.) to accepty_probas input.Added input validation: functions now check that either
(model and X)ory_probare provided, not both missing.Preserved existing model-based workflows for backward compatibility.
Extended unit tests in
unittests/to cover the new probability-based path, including edge cases and validation errors.
Why
End users sometimes already have predicted probabilities from external pipelines or pre-computed experiments. This change avoids forcing them to re-supply the model, streamlining the evaluation process.
Version 0.0.3a
Added
"plotly>=5.18.0, <=5.24.1"inpyproject.toml,setup.py,README_min.md–> forpartial_dependence.pyfunctions
Version 0.0.2a
Add
show_ks_curvefunction and enhancesummarize_model_performanceby @lshpaner in https://github.com/lshpaner/model_metrics/pull/1Add
plot_threshold_metricsFunction by @lshpaner in https://github.com/lshpaner/model_metrics/pull/2Add
pr_feature_plotand Updateroc_feature_plotfor Enhanced Visualization by @lshpaner in https://github.com/lshpaner/model_metrics/pull/3Reg table enhance by @lshpaner in https://github.com/lshpaner/model_metrics/pull/4
Rmvd (%) from MAPE header by @lshpaner in https://github.com/lshpaner/model_metrics/pull/5
Moved roc legend to lower right default by @lshpaner in https://github.com/lshpaner/model_metrics/pull/6
Allow Flexible Inputs and Save Behavior for
show_roc_curve()by @lshpaner in https://github.com/lshpaner/model_metrics/pull/7Prcurve calc tests by @lshpaner in https://github.com/lshpaner/model_metrics/pull/8
Removed unused imports and functions by @lshpaner in https://github.com/lshpaner/model_metrics/pull/9
changed saving nomenclature in
show_confusion_matrixby @lshpaner in https://github.com/lshpaner/model_metrics/pull/10Fix Calibration Curve Grid Plot Behavior and Update Model Nomenclature by @lshpaner in https://github.com/lshpaner/model_metrics/pull/11
Improved support for multiple models and group categories in calibration curve by @lshpaner in https://github.com/lshpaner/model_metrics/pull/13
Upd.
plot_threshold_metricsw/ new lookup_kwgs and legend logic by @lshpaner in https://github.com/lshpaner/model_metrics/pull/14Rmv. unused arguments by @lshpaner in https://github.com/lshpaner/model_metrics/pull/15
Move PDF-related Functions from
eda_toolkittomodel_metricsby @lshpaner in https://github.com/lshpaner/model_metrics/pull/16
Full Changelog: https://github.com/lshpaner/model_metrics/compare/0.0.1a…0.0.2a
Version 0.0.1a
Updated unit tests and
READMEAdded
statsmodelsto library importsAdded coefficients and p-values to regression summary
Added regression capabilities to
summarize_model_performanceAdded lift and gains charts
Updated versions for earlier Python compatibility