How to EDA Manually VS SweetViz VS DataPrep
EDA (Exploratory Data Analysis) is a critical step in data analysis and the first thing to do when you get your hands on a dataset. It helps to understand the underlying patterns and relationships in the data, get a general sense and intuition of the data, and identify potential outliers and errors. EDA is typically performed before data preprocessing to guide the development of more complex statistical models.
Although it’s typical and understandable to perform EDA and Data Preprocessing in conjunction with each other, my main focus for this article will be primarily on EDA.
Let’s dive in~!
The Dataset
The Gist
Here’s a simple cheatsheet for syntaxes that might be useful in general:
For more detailed EDA with my own personal comments, do view them at this Kaggle link!
Now, I will share 2 most commonly used Automated EDA Tools:
- SweetViz
- DataPrep
Sweetviz and DataPrep are two Python libraries that work together to help users prepare, clean, explore and analyze their data. Together, these two libraries offer a comprehensive suite of tools for users to get their data ready for analysis and gain insights from it with just a couple lines of code.
SweetViz
Sweetviz is an open-source Python library for automated exploratory data analysis (EDA) that generates rich and interactive visualizations to help users understand their data better.
View the full EDA at this Kaggle link!
Check their documentation here.
Here’s a Medium Article that further elaborates on its use cases.
SweetViz is an appropriate choice for users that are looking for visualizing target values and comparing two datasets. Unlike DataPrep, SweetViz supports comparison between a maximum of two dataframes.
DataPrep
DataPrep is a Python library similar to SweetViz but focuses more on data cleaning and preparation. It provides tools to help with common data cleaning tasks such as data type conversion, missing value imputation, and duplicate record removal.
View the full EDA at this Kaggle link!
Check their documentation here.
Visualizations in DataPrep use Bokeh, which makes them interactive. Another notable feature of DataPrep visualizations is insight notes displayed along with the visualizations. These insights provide a summary of the distribution and eliminate the need for the user to perform extra calculations. DataPrep would be the ideal go-to tool for users analyzing and comparing more than two data frames.
The End
Yeap, that’s it! I intend to keep this really short and pretty intuitive.
Thanks so much for reading my article!!! Feel free to connect with me on LinkedIn. Cheers!