COURSES OFFERED BY SUVIDHA SHIKSHA

Working as a non-governmental organization ‘SUVIDHA FOUNDATION’ is generously helping financially challenged individuals in education. It helps to convey education who couldn’t afford to do so and…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Regularization!

Hello reader,

This blogpost will help you to understand why regularization is important in training the Machine Learning models, and also why it is most talked about topic in ML domain.

So, lets look at this plot. What are the things we are deciphering from this?
In this graph x-axis is the time taken for iterations and y-axis is the loss on training and test data. Can you notice anything wrong here?

Yeah, the loss values are nicely tending down on training data but shoots upwards at some point on test data. This is not good. There is some amount of overfitting happening over the test data. How can we address this?

We can do one thing, we can early stop the iteration to avoid overfitting. This is called as Early stopping. Early stopping is a feature that enables the training to be automatically stopped when a chosen metric has stopped improving. But there is a problem with early stopping is that the model does not make use of all available training data thus makes the training data very limited.

So what should be done now? This is where regularization comes to rescue.
Regularization is technique that minimizes the coefficient estimates towards zero. That means it prevents the model to learn complexities of the data and avoids overfitting.

There are two types of regularization techniques:

Ridge Regression is a type of regularization technique in which we introduce a small bias known as ridge regression penalty. This is also known as L-2 Regularization

In simple regression, this is the fitting procedure and the loss function is Residual sum of squares. In the equation Y is the line having features and slope. Look at the below equation.
If we have one feature it would be y = βx +c
If we have two features it would be y = β1x1 + β2x2 + c
And this goes on for number of features in this way
Y ≈ β0 + β1X1 + β2X2 + …+ βpXp

So RSS is the cost function of Linear regression and hence the cost function of ridge regression becomes RSS + λ(slope)²

So we have to reduce this value by penalizing the fitted line by adding bias value. This will reduce the chances of overfitting. The λ value lies between 0–1.

This is also called as L-1 Regularization. Lasso regression works in the same manner just like Ridge but the only difference is the penalty term. In ridge we take the squares of slope whereas in lasso we take the magnitude of slope.

We use magnitude of slopes because we are not only avoid overfitting but also using it for feature selection.

In ridge regression as we increase the penalty term the slope will slowly tend to zero but in lasso regression it will become zero as we increase the penalty term. That's why we use lasso regression as Feature selection method. Because after increasing the penalty term some feature values will become zero and hence we are concluding that these features are not important for predicting our best fit line.

Thanks for reading!

If you liked this please upvote me and also suggest some more topics.

Feel free to comment below and suggest anything you feel that I have missed. Thanks!

COURSES OFFERED BY SUVIDHA SHIKSHA

Regularization!

Add a comment

Related posts:

What is your Ink about?

How can I delete the free fire account permanently 2023?

10 Spending Habits That Are Killing Your Wallet