Linear Models Battle: Who Wins Disaster Tweet Prediction?

On day three of my ML journey, I'm exploring different linear models to see how they work and which ones are most accurate. Before I start coding, I want to understand the two main learning approaches in machine learning:

Supervised Learning

1. 1 Linear model

Let's start by understanding what a linear model is. It's like a special tool that helps us make guesses (or predictions) when one thing seems to be related to another.

Imagine this:

Your favorite basketball player, Steph Curry, is practicing his 3-pointers. You notice a pattern:

Scenario 1: He takes 100 shots and makes 10 of them.
Scenario 2: He takes 200 shots and makes 40 of them.

Now, you want to guess how many 3-pointers he might make if he takes 250 shots. A linear model can help us with that!

Think of it like this:

The Dots: Each scenario (100 shots, 10 makes; 200 shots, 40 makes) is like a dot on a piece of graph paper.
The Line: A linear model tries to draw the best straight line that goes through those dots.
The Prediction: Now, you can find 250 on the "shots" part of the graph, follow it up to the line, and see where it matches up on the "3-pointers" part. That's your guess!

The Math Behind It:

The equation ŷ(w, x) = w₀ + w₁x₁ + ... + wₚxₚ looks complicated, but it's just a way of writing down the recipe for that prediction line.

ŷ (y-hat): This is our guess for how many 3-pointers.
w₀: This is like the starting point of the line.
w₁x₁: This part shows how the number of shots (x₁) changes the number of 3-pointers. The "w₁" tells us how steep the line should be.
...: + wₚxₚ: This just means we can add more things to our recipe if we want to consider other factors that might affect Curry's 3-pointers.

1.1.1 Ordinary Least Squares

There are few things we need to know before we start Ordinary Least Square unlike our above example things tends to be little different the line we draw from curry shots are not always perfectly aligned.

1. Data Point

2. Finding the Best-Fit Line using OLS:

To find the best-fit line, we'll use OLS to determine the values of w₀ (the y-intercept) and w₁ (the slope) in our linear regression model:

ŷ = w₀ + w₁x

We can use the following formulas (derived from OLS) to calculate these coefficients:

w₁ = Σ((x - x̄)(y - ȳ)) / Σ(x - x̄)²
w₀ = ȳ - w₁ * x̄

Where:

x̄ is the mean of x (shots taken)
ȳ is the mean of y (3-pointers made)

Calculating the means and applying the formulas, we get:

x̄ = 10.7
ȳ = 9.6
w₁ ≈ 0.8286
w₀ ≈ 0.8143

So our best-fit line equation is:

ŷ = 0.8143 + 0.8286x

3. Calculating the Residuals and SSE:

Now, let's calculate the predicted values (ŷ) for each game using our equation, then find the residuals (y - ŷ) and their squares:

Finally, we add up all the squared errors to find the Sum of Squared Errors (SSE):

SSE ≈ 31.7655

Graph with the Best-Fit Line:

Code

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
 
# Model Initialization and Fitting
ols_model = LinearRegression()
ols_model.fit(tf_train, y_train)

# Prediction
predict_ols = ols_model.predict(tf_test)

# Evaluation
mse = mean_squared_error(y_test, predict_ols)
print(f"Accuracy{round(mse*100,2)}%")

This provide me with the accuracy of 15.76%. Now the questions can arise I am getting such a low score.

The answer is simple I had no idea where to use Ordinary Least Square Method. The data I used is classification data set.

Why OLS isn't ideal for classifications?

We already know OSL is a regression technique and is used to estimate the relationship between the dependent variables and one or more independent variables.
Whereas, Classifications problems involve predicting categorical outcomes(e.g. "spam" or "not spam"), which do not have the linear relationship with the input Features. In these case, the assumptions of OLS are violated. for example: the probability of an email being spam might not increase linearly with the frequency of certain keyword.
OLS regression can produce continuous output value, which are not suitable for categorical outcomes. For example, a regression model might predict a value of 0.7 for spam, but it does not mean anything in binary classification

Where to use OLS?

Linear Relationship
Continuous Target Variable: House Prices, Stok Prices, Temperature, Product sales etc.
Normal Distribution

Logistic Regression

Unlike OLS, Logistic Regression is a linear model for classification. So, How does a linear model works for classification?

It does it by transforming the output of a linear equations using a special feature called the logistic function(also known as the sigmoid function).

Linear Equation

Just like linear regression, logistic regression starts with a linear equation:

z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ

z is the linear predictor

β₀ is the intercept

β₁, β₂, ..., βₙ are the coefficients for each feature.

x₁, x₂, ..., xₙ are the feature values of the input.

Logistics Functions( Sigmoid Function)

The linear predictor z is then passed through the logistic function(sigmoid function) to get the probability:

p = 1 / (1 + e⁻ᶻ)

This function squashes the output z to the value 0 to 1, representation the probability of the input belonging to the positive class (e.g. "disaster").

Decision Boundary

You choose a threshold (often 0.5). If the probability p is above the threshold, the input is classified as the positive class; otherwise, it's classified as the negative class. The decision boundary is the line (or hyperplane) that separates the input space into regions where the model predicts one class or the other.

Training the Model:

Finding the Best Coefficients (β): The training process involves finding the optimal values of the coefficients (β₀, β₁, ...) that maximize the likelihood of the observed data (the training set). This is usually done using an optimization algorithm like gradient descent.
Likelihood: The likelihood function measures how well the model's parameters explain the observed data. In logistic regression, it's the product of the probabilities of each observation being correctly classified by the model.
Maximum Likelihood Estimation (MLE): The goal is to find the values of the coefficients that maximize the likelihood function. This gives us the model that best explains the observed data.

Example: Disaster Tweet Classification

Let's say a simplified model uses two features:

x₁: Frequency of the word "help" in the tweet.
x₂: Frequency of the word "earthquake" in the tweet.

The trained model might have coefficients:

β₀ = -3
β₁ = 0.8
β₂ = 1.5

For a tweet with x₁ = 2 and x₂ = 3, the model would calculate:

z = -3 + (0.8 * 2) + (1.5 * 3) = 2.1
p = 1 / (1 + e⁻².¹) ≈ 0.89

Since p is greater than 0.5, the model would classify this tweet as a "disaster" tweet.

Code:

from sklearn.linear_model import LogisticRegression

# Model Initializing and Fitting
LR_Model = LogisticRegression(max_iter=100000)
LR_Model.fit(tf_train,y_train)

#prediction
predict_LR = LR_Model.predict(tf_test)

#evaluation
score_lR = accuracy_score(y_test,predict_LR)
print(f"Accuracy{round(score_lR*100,2)}%")

This give me the accuracy of 83.06%

Search This Blog

SaileshLearnCode