Pycaret, what a great convenience…

Şeyda Arı
5 min readMar 1, 2021

Hello everyone again. In this article, I will talk about a machine learning library that provides me great convenience.

In machine learning, analyzing data and preparing the data before giving it to the model is of great importance. When I say preparing data, I’m talking about filling empty data, getting rid of outliers, doing future engineering, etc. As data grows and becomes more complex, you will spend more time implementing these processes. But with Pycaret it’s quite the opposite.
Pycaret is a low-code machine learning library. Whether it’s imputing missing values, transforming categorical data, feature engineering, or even hyperparameter tuning of models, PyCaret automates all of it. You don’t need to split or normalize the data either, and Pycaret takes care of them. And it does them with just a few lines of code.
Now I will show you how to use the Pycaret library in machine learning with simple examples. So let’s start:

In the beginning I want to show you how to install the pycaret package.

  • REGRESSION

First, we have included our Pandas library and the Pycaret library for regression.

This function prepares data for modeling and distribution. setup () must called before executing any other function in pycaret. It takes 2 mandatory parameters: dataframe and name of the target column.

This function train all the models available in the model library. The output prints a score grid with MAE, MSE RMSE, R2, RMSLE, and MAPE (averaged across folds), determined by fold parameter.

And that’s it. So easy right. Let’s do this for classification.

  • CLASSIFICATION

You can sort the table according to the value you want with sort.

Creating a model in any module is as simple as writing create_model. It takes only one parameter i.e. the Model ID as a string.

You can also limit the results only to the model you choose.

The number of folds can be defined using the fold parameter within the create_model function. By default, the fold is set to 10. All the metrics are rounded to 4 decimals by default by can be changed using round parameter within create_model.

In the case of Classification, method parameters can be used to define ‘soft‘ or ‘hard‘ where soft uses predicted probabilities for voting and hard uses predicted labels.

  • CLUSTERING

It’s time for clustering. I will show you how you can make amazing graphics with pycaret in one line. For this, I will give an example from the codes in my clustering project.

  • session_id: int, default = None If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.
  • log_experiment: bool, default = False When set to True, all metrics and parameters are logged on MLFlow server.
  • experiment_name: str, default = None Name of experiment for logging. When set to None, ‘clf’ is by default used as an alias for the experiment name.
  • log_plots: bool, default = False When set to True, specific plots are logged in MLflow as a png file. By default, it is set to False.

We enter the abbreviated string of the model. “kmeans” represents K-Means Clustering.

And let’s add some visuals to it.

This function takes a trained model object and returns a plot on the dataset passed during the setup stage. This function internally calls assign_model before generating a plot.

Take a look at the chart we created with a single line of code. I think that pretty amazing.

t-SNE : (3d) Dimension Plot

And my favorite graphic:

By using the “elbow” method, I can see how many groups I have to divide into to get the most optimum result. I would have been able to see it if I did not use Pycaret charts and draw graphics myself. However, I would have to determine the number of groups I had to divide. I would determine this by looking where the elbow was broken. Where the elbow is broken, it should be the maximum group count. But I don’t even need to do that here because as you can see the group count is marked with a vertical line.

And here we come to the end. How much work did we do with less code, right? That’s exactly when I wanted to explain and show. I hope I was able to surprise you a little bit. Thank you for your time and for reading my article. Please feel free to share your ideas with me. If there is something you want me to write about, you can write it in the comments. See you in my next article.

--

--