Goodbye to “EDA”, Welcome PPR!

Şeyda Arı
4 min readMar 16, 2021

Hello hello, here we are again. In my article last week, I mentioned EDA (Exploratory the Data). We all know how important it is to understand data. So what if I told you there was an easier way to do this? While we were taking artificial intelligence lessons, when we got to the middle of the training, we knew EDA as well as our name. And our teacher told us about the Pandas Profiling Report (PPR). I remember how surprised we were with the ease and features of PPR in every detail. So let’s get started.

Of course, our first job will be to install PPR.

Then you can include the pandas library and read a file you want and apply pandas profiling on this data frame. Like this:

And it’s time to import pandas profiling and apply it to our data.

Get ready for what you see now. Step by step, I will show and explain what pandas profiling does on data. PPR examines the data and tells us in 6 chapters. These sections are:

  • Overview

The overview is also divided into 3 sections. These sections are overview, warnings and reproduction.

Okay, what kind of things have we learned about data so far? Such as column properties, missing cells, duplicates rows, numerical and categorical columns, etc. We have also seen columns that show a high correlation with each other.

  • Variables

Examines each column separately in the Variables section. Here I will only show the output for 2 columns.

And if you click “Toggle details”:

Even by now, we know a lot about data. It’s pretty cool, right?

  • Interactions

In this section, you can see scatter plots for numeric columns. And you can choose which numeric columns you want to see.

For example, here we see a scatter plot for Price and Mileage columns.

And here we can see a scatter plot for Mileage and Liter columns.

  • Correlations

In this section, we can see the correlations of all columns with each other as visualized with a heatmap. And the good thing is we can see this for 5 different types of correlation.

And if you click “Toggle correlation descriptions”, you can see explanations of these 5 different correlations.:

  • Missing Values

We have the opportunity to visually see the missing values in the data in this section. And as we can see, there is no missing values in this data.

  • Sample

And we came to our last part. In this section, we can see examples from the beginning and end of the data.

In this article, I wanted to proceed with visual examples from all outputs. We have seen that PPR provides great convenience to data scientists. Because we had a lot of information that we could not see with EDA. I hope that has been revealing. Thank you for reading. See you in my next article.

--

--