Welcome to the Model Metrics Python Library Documentation!

Note

This documentation is for model_metrics version 0.0.5a3.

Welcome to Model Metrics! Model Metrics is a versatile Python library designed to streamline the evaluation and interpretation of machine learning models. It provides a robust framework for generating predictions, computing model metrics, analyzing feature importance, and visualizing results. Whether you’re working with SHAP values, model coefficients, confusion matrices, ROC curves, precision-recall plots, and other key performance indicators.

What is Model Evaluation?

Model evaluation is a fundamental aspect of the machine learning lifecycle. It involves assessing the performance of predictive models using various metrics to ensure accuracy, reliability, and fairness. Proper evaluation helps in understanding how well a model generalizes to unseen data, detects potential biases, and optimizes performance. This step is critical before deploying any model into production.

Purpose of Model Metrics Library

The model_metrics library is a comprehensive framework designed to simplify and standardize the evaluation of machine learning models. It provides an extensive set of tools to assess model performance, diagnose issues, compare different approaches, and validate results with statistical rigor. Key functionalities include:

  • Performance Metrics: A comprehensive suite of functions to compute essential metrics for classification (accuracy, precision, recall, F1-score, ROC-AUC, PR-AUC, log loss, Brier score) and regression (RMSE, MAE, R², adjusted R²) tasks.

  • Advanced Diagnostics: Extensive residual diagnostics with heteroskedasticity testing, LOWESS smoothing, influence plots, and leverage analysis for regression models.

  • Visualization Tools: Rich plotting capabilities including confusion matrices, ROC curves with operating points and DeLong tests, precision-recall curves, calibration plots, gain charts with Gini coefficients, threshold optimization plots, and comprehensive residual diagnostic plots.

  • Stratified Analysis: Group-based evaluation with group_category support across all major plotting functions, enabling demographic fairness analysis and subgroup performance assessment.

  • Statistical Testing: Built-in statistical tests including DeLong’s test for ROC curve comparison, heteroskedasticity tests (Breusch-Pagan, White, Goldfeld-Quandt, Spearman), and automated diagnostic table generation.

  • Model Comparison: Frameworks to compare multiple models with statistical significance tests, automated performance summaries, and side-by-side visualizations.

Key Features

  • Comprehensive Evaluation: Supports both classification and regression tasks with extensive diagnostic capabilities, including residual analysis, influence detection, and model assumption validation.

  • User-Friendly: Designed for ease of use with intuitive functions, sensible defaults, and well-documented workflows with practical examples.

  • Highly Customizable: Extensive styling options via keyword argument dictionaries (point_kwgs, line_kwgs, group_kwgs, centroid_kwgs, etc.) allow precise control over plot appearance and legend behavior.

  • Modular Architecture: Clean separation between calculation utilities (metrics_utils.py), plotting helpers (plot_utils.py), and main evaluation functions (model_evaluator.py) ensures maintainability and extensibility.

  • Seamless Integration: Works with popular libraries such as Scikit-Learn, XGBoost, LightGBM, and TensorFlow, provided that model objects follow standard prediction interfaces like predict(), predict_proba(), or decision_function(). Special considerations may be required for deep learning models, time-series models, or custom transformers that return non-standard outputs.

  • Flexible Input Methods: Most functions accept either fitted model objects with feature matrices, or direct predictions (y_pred, y_prob), accommodating various workflows and use cases.

  • Publication-Ready Output: Generates high-quality visualizations with customizable figure sizes, font sizes, grid lines, color schemes, and save options (PNG/SVG) suitable for reports and presentations.

  • Detailed Reports: Provides automated summaries, diagnostic tables with customizable decimal precision, and visual insights to aid in model selection, debugging, and decision-making.

Prerequisites

Before you install model_metrics, ensure your system meets the following requirements:

  • Python: version 3.7.4 or higher is required to run model_metrics.

Additionally, model_metrics depends on the following packages, which will be automatically installed when you install model_metrics:

  • matplotlib: version 3.5.3 or higher, but capped at 3.9.2

  • numpy: version 1.21.6 or higher, but capped at 2.1.0

  • pandas: version 1.3.5 or higher, but capped at 2.2.3

  • plotly: version 5.18.0 or higher, but capped at 5.24.0

  • scikit-learn: version 1.0.2 or higher, but capped at 1.5.2

  • shap: version 0.41.0 or higher, but capped below 0.46.0

  • statsmodels: version 0.12.2 or higher, but capped below 0.14.4

  • tqdm`: version 4.66.4 or higher, but capped below 4.67.1

Installation

You can install model_metrics directly from PyPI:

pip install model_metrics

Description

This guide provides detailed instructions and examples for using the functions provided in the model_metrics library and how to use them effectively in your projects.

For most of the ensuing examples, we will leverage the Census Income Data (1994) from the UCI Machine Learning Repository [1]. This dataset provides a rich source of information for demonstrating the functionalities of the model_metrics.