Welcome to the Model Metrics Python Library Documentation!
Note
This documentation is for model_metrics version 0.0.5a3.
Welcome to Model Metrics! Model Metrics is a versatile Python library designed to streamline the evaluation and interpretation of machine learning models. It provides a robust framework for generating predictions, computing model metrics, analyzing feature importance, and visualizing results. Whether you’re working with SHAP values, model coefficients, confusion matrices, ROC curves, precision-recall plots, and other key performance indicators.
Project Links
What is Model Evaluation?
Model evaluation is a fundamental aspect of the machine learning lifecycle. It involves assessing the performance of predictive models using various metrics to ensure accuracy, reliability, and fairness. Proper evaluation helps in understanding how well a model generalizes to unseen data, detects potential biases, and optimizes performance. This step is critical before deploying any model into production.
Purpose of Model Metrics Library
The model_metrics library is a comprehensive framework designed to simplify and
standardize the evaluation of machine learning models. It provides an extensive
set of tools to assess model performance, diagnose issues, compare different approaches,
and validate results with statistical rigor. Key functionalities include:
Performance Metrics: A comprehensive suite of functions to compute essential metrics for classification (accuracy, precision, recall, F1-score, ROC-AUC, PR-AUC, log loss, Brier score) and regression (RMSE, MAE, R², adjusted R²) tasks.
Advanced Diagnostics: Extensive residual diagnostics with heteroskedasticity testing, LOWESS smoothing, influence plots, and leverage analysis for regression models.
Visualization Tools: Rich plotting capabilities including confusion matrices, ROC curves with operating points and DeLong tests, precision-recall curves, calibration plots, gain charts with Gini coefficients, threshold optimization plots, and comprehensive residual diagnostic plots.
Stratified Analysis: Group-based evaluation with
group_categorysupport across all major plotting functions, enabling demographic fairness analysis and subgroup performance assessment.Statistical Testing: Built-in statistical tests including DeLong’s test for ROC curve comparison, heteroskedasticity tests (Breusch-Pagan, White, Goldfeld-Quandt, Spearman), and automated diagnostic table generation.
Model Comparison: Frameworks to compare multiple models with statistical significance tests, automated performance summaries, and side-by-side visualizations.
Key Features
Comprehensive Evaluation: Supports both classification and regression tasks with extensive diagnostic capabilities, including residual analysis, influence detection, and model assumption validation.
User-Friendly: Designed for ease of use with intuitive functions, sensible defaults, and well-documented workflows with practical examples.
Highly Customizable: Extensive styling options via keyword argument dictionaries (
point_kwgs,line_kwgs,group_kwgs,centroid_kwgs, etc.) allow precise control over plot appearance and legend behavior.Modular Architecture: Clean separation between calculation utilities (
metrics_utils.py), plotting helpers (plot_utils.py), and main evaluation functions (model_evaluator.py) ensures maintainability and extensibility.Seamless Integration: Works with popular libraries such as
Scikit-Learn,XGBoost,LightGBM, andTensorFlow, provided that model objects follow standard prediction interfaces likepredict(),predict_proba(), ordecision_function(). Special considerations may be required for deep learning models, time-series models, or custom transformers that return non-standard outputs.Flexible Input Methods: Most functions accept either fitted model objects with feature matrices, or direct predictions (
y_pred,y_prob), accommodating various workflows and use cases.Publication-Ready Output: Generates high-quality visualizations with customizable figure sizes, font sizes, grid lines, color schemes, and save options (PNG/SVG) suitable for reports and presentations.
Detailed Reports: Provides automated summaries, diagnostic tables with customizable decimal precision, and visual insights to aid in model selection, debugging, and decision-making.
Prerequisites
Before you install model_metrics, ensure your system meets the following requirements:
Python: version
3.7.4or higher is required to runmodel_metrics.
Additionally, model_metrics depends on the following packages, which will be automatically installed when you install model_metrics:
matplotlib: version3.5.3or higher, but capped at3.9.2numpy: version1.21.6or higher, but capped at2.1.0pandas: version1.3.5or higher, but capped at2.2.3plotly: version5.18.0or higher, but capped at5.24.0scikit-learn: version1.0.2or higher, but capped at1.5.2shap: version0.41.0or higher, but capped below0.46.0statsmodels: version0.12.2or higher, but capped below0.14.4tqdm`: version4.66.4or higher, but capped below4.67.1
Installation
You can install model_metrics directly from PyPI:
pip install model_metrics
Description
This guide provides detailed instructions and examples for using the functions
provided in the model_metrics library and how to use them effectively in your projects.
For most of the ensuing examples, we will leverage the Census Income Data (1994) from
the UCI Machine Learning Repository [1]. This dataset provides a rich source of
information for demonstrating the functionalities of the model_metrics.
Table of Contents
Getting Started
Performance Assessment
- Model Performance Summaries
- Lift Charts
- Gain Charts
- ROC AUC Curves
- Precision-Recall Curves
- Confusion Matrix Evaluation
- Calibration Curves
- Threshold Metric Curves
- Residual Diagnostics
show_residual_diagnostics()- Residual Diagnostics Example 1: All Residual Diagnostics Plots
- Residual Diagnostics Example 2: Single Plot with LOWESS Smoothing
- Residual Diagnostics Example 3: Diagnostics Table Only
- Residual Diagnostics Example 4: Diagnostics to DataFrame
- Residual Diagnostics Example 5: Grouped Analysis with Customization
- Residual Diagnostics Example 6: Multiple Models with Shared Axes
Partial Dependence
Conceptual Notes
- Interpretive Context
- Binary Classification Outputs
- Threshold Selection logic
- Calibration Trade-offs
- Lift: Mathematical Definition
- Gain: Mathematical Definition
- Partial Dependence Foundations
- Regression Outputs
Model Training Overview
About Model Metrics
- Acknowledgements
- Contributors/Maintainers
- Citing Model Metrics
- Changelog
- Version 0.0.5a3
- Version 0.0.5a2
- Testing Improvements
- Code Quality
- Residual Diagnostics Expansion
- Group Category Support
- Gain Chart Enhancement
- Legend Customization
summarize_model_performanceshow_confusion_matrixshow_roc_curveshow_pr_curveshow_calibration_curveshow_gain_chartplot_threshold_metricsshow_residual_diagnostics- Bug Fixes
- Documentation Improvements
- Testing
- Performance
- Migration Guide
- Version 0.0.5a1
- Version 0.0.4a10
- Version 0.0.4a9
- Version 0.0.4a8
- Version 0.0.4a7
- Version 0.0.4a6
- Version 0.0.4a8
- Version 0.0.4a4
- Version 0.0.4a3
- Version 0.0.4a2
- Version 0.0.4a1
- Version 0.0.4a
- Version 0.0.3a
- Version 0.0.2a
- Version 0.0.1a
- References