Least Square Regression Method for AI and ML
As Machine Learning and Artificial Intelligence become the backbone of today’s tech world, it is important to learn popular methods like “Least Square Regression” for understanding the math behind the regression analysis along with the implementation practices with Python.
In this blog, we are providing an in-depth knowledge of the Least Square Regression Method and Enroll in our AI Training in Chennai for the best practices.
What is the Least Square Regression Method?
The least-square regression is the technique used in regression analysis of ML model and AI implementation.
It is one of the popular mathematical methods for finding the best possible fit line that defines the connection between dependent and independent variables.
What is the Best Fit Line?
The best fit line is used to define the relationship between two or more variables that are drawn across a scatter plot of data points for representing them.
It is used in regression analysis for obtaining a definite relationship between the predictor variable and the target variable.
The least-square regression method is the effective one for drawing the line of best fit. As it minimizes the most possible extents, it is named as “Least Square Regression” method.
Steps for calculating the line of the best fit
We require to get the basics for constructing the line that depicts the best relationship between variables in the data with the following equation.
y=mx+c
Here, y denotes the dependent variable, m denotes the slope of the line, x denotes the independent variable, and c denotes the y-intercept.
Step 1: Calculating the slope ‘m’ with the following formula
m=nxy-(x) (ynx2-(x)2
Step 2: Computing the y-intercept that is the value of y at the point in which the line crosses the y-axis using the following formula
c=y-mx
Step 3: Replacing the values in the following final equation
y=mx+c
Example for Least Square Regression Method
Price of Cars in Rupees (x) | Number of Cars Sold (y) |
100000 | 12 |
154748 | 9 |
184587 | 7 |
214785 | 5 |
245870 | 2 |
Price of Cars in Rupees (x) | Number of Cars Sold (y) | Y = mx + c | Error |
100000 | 12 | 8.2 | 0.54 |
154748 | 9 | 6.7 | 0.89 |
184587 | 7 | 5.3 | -0.97 |
214785 | 5 | 4.3 | -0.68 |
245870 | 2 | 3.1 | -0.41 |
Now, it is easy to estimate how many cars with the possible price can you sell for maximizing sales.
Important things to have in mind before implementing the Least Square Regression method
- The data must be free of outliers
- The line of the best fit to be drawn iteratively
- If the method works for non-linear data
- Is this residual that denotes the error.
Least Square Regression Implementation in Python
Here, we are going to build a model for understanding the relationship between the head size and the brain weight of an individual by this python implementation of the Least Square Regression method.
We require data set that contains gender in binary values, age, head size in centimeters, and brain weight in grams.
Logic to be applied: To implement the linear regression for developing a model that shows the relationship between an independent and dependent variable.
Step 1: Import the libraires
import numpy as np
import pandas as pd
import matplotlib as plt
Step 2: Import the required data set
# Reading Data
data = pd.read_csv(‘C:UsersNeelTempDesktopheadbrain.csv’)
print(data.shape)
(237, 4)
print(data.head())
Gender Age Range Head Size(cm^3) Brain Weight(grams)
0 1 1 4512 1530
1 1 1 3738 1297
2 1 1 4261 1335
3 1 1 3777 1282
4 1 1 4177 1590
Step 3: Assigning ‘x’ as independent variable and ‘y’ as dependent variable
# Coomputing X and Y
X = data[‘Head Size(cm^3)’].values
Y = data[‘Brain Weight(grams)’].values
Further it must be proceeded with the below codes
# Mean X and Y
mean_x = np.mean(X)
mean_y = np.mean(Y)
# Total number of values
n = len(X)
Step 4: Calculating the values of the slope and y-intercept
# Using the formula to calculate ‘m’ and ‘c’
numer = 0
denom = 0
for i in range(n):
numer += (X[i] – mean_x) * (Y[i] – mean_y)
denom += (X[i] – mean_x) ** 2
m = numer / denom
c = mean_y – (m * mean_x)
# Printing coefficients
print(“Coefficients”)
print(m, c)
Coefficients
0.26342933948939945 325.57342104944223
Step 5: Plotting the line of best fit
# Plotting Values and Regression Line
max_x = np.max(X) + 100
min_x = np.min(X) – 100
# Calculating line values x and y
x = np.linspace(min_x, max_x, 1000)
y = c + m * x
# Ploting Line
plt.plot(x, y, color=’#58b970′, label=’Regression Line’)
# Ploting Scatter Points
plt.scatter(X, Y, c=’#ef5423′, label=’Scatter Plot’)
plt.xlabel(‘Head Size in cm3’)
plt.ylabel(‘Brain Weight in grams’)
plt.legend()
plt.show()
Step 6: Evaluating the Model
# Calculating Root Mean Squares Error
rmse = 0
for i in range(n):
y_pred = c + m * X[i]
rmse += (Y[i] – y_pred) ** 2
rmse = np.sqrt(rmse/n)
print(“RMSE”)
print(rmse)
RMSE
72.1206213783709
It can be implemented with another method called R-Squared in Python with the codes as follows
# Calculating R2 Score
ss_tot = 0
ss_res = 0
for i in range(n):
y_pred = c + m * X[i]
ss_tot += (Y[i] – mean_y) ** 2
ss_res += (Y[i] – y_pred) ** 2
r2 = 1 – (ss_res/ss_tot)
print(“R2 Score”)
print(r2)
R2 Score
0.6393117199570003
Conclusion
Finding the line of best fit with the Least Square Regression method has been explained here. We hope you understand this theoretically.
Join Softlogic Systems to gain hands-on exposure to the popular methods and get the best practices for implementing them in your desired programming languages.
Enroll today for the best Machine Learning Training in Chennai for your bright future.