Welcome to the EDA Toolkit Python Library Documentation!
Note
This documentation is for eda_toolkit
version 0.0.7
.
The eda_toolkit
is a comprehensive library designed to streamline and
enhance the process of Exploratory Data Analysis (EDA) for data scientists,
analysts, and researchers. This toolkit provides a suite of functions and
utilities that facilitate the initial investigation of datasets, enabling users
to quickly gain insights, identify patterns, and uncover underlying structures
in their data.
Project Links
What is EDA?
Exploratory Data Analysis (EDA) is a crucial step in the data science workflow. It involves various techniques to summarize the main characteristics of the data, often with visual methods. EDA helps in understanding the data better, identifying anomalies, discovering patterns, and forming hypotheses. This process is essential before applying any machine learning models, as it ensures the quality and relevance of the data.
Purpose of EDA Toolkit
The eda_toolkit
library is a comprehensive suite of tools designed to
streamline and automate many of the tasks associated with Exploratory Data
Analysis (EDA). It offers a broad range of functionalities, including:
Data Management: Tools for managing directories, generating unique IDs, standardizing dates, and handling common DataFrame manipulations.
Data Cleaning: Functions to address missing values, remove outliers, and correct formatting issues, ensuring data is ready for analysis.
Data Visualization: A variety of plotting functions, including KDE distribution plots, stacked bar plots, scatter plots with optional best fit lines, and box/violin plots, to visually explore data distributions, relationships, and trends.
Descriptive and Summary Statistics: Methods to generate comprehensive reports on data types, summary statistics (mean, median, standard deviation, etc.), and to summarize all possible combinations of specified variables.
Reporting and Export: Features to save DataFrames to Excel with customizable formatting, create contingency tables, and export generated plots in multiple formats.
Key Features
Ease of Use: The toolkit is designed with simplicity in mind, offering intuitive and easy-to-use functions.
Customizable: Users can customize various aspects of the toolkit to fit their specific needs.
Integration: Seamlessly integrates with popular data science libraries such as
Pandas
,NumPy
,Matplotlib
, andSeaborn
.Documentation and Examples: Comprehensive documentation and examples to help users get started quickly and effectively.
Prerequisites
Before you install eda_toolkit
, ensure your system meets the following requirements:
Python: version
3.7.4
or higher is required to runeda_toolkit
.
Additionally, eda_toolkit
depends on the following packages, which will be automatically installed when you install eda_toolkit
:
numpy
: version1.21.6
or higherpandas
: version1.3.5
or highermatplotlib
: version3.5.3
or higherseaborn
: version0.12.2
or higherjinja2
: version3.1.4
or higherxlsxwriter
: version3.2.0
or higher
Installation
You can install eda_toolkit
directly from PyPI:
pip install eda_toolkit