Linear Models Battle
Linear Models Battle: Who Wins Disaster Tweet Prediction?
Supervised Learning
1. 1 Linear model
Let's start by understanding what a linear model is. It's like a special tool that helps us make guesses (or predictions) when one thing seems to be related to another.
Imagine this:
Your favorite basketball player, Steph Curry, is practicing his 3-pointers. You notice a pattern:
- Scenario 1: He takes 100 shots and makes 10 of them.
- Scenario 2: He takes 200 shots and makes 40 of them.
Now, you want to guess how many 3-pointers he might make if he takes 250 shots. A linear model can help us with that!
Think of it like this:
- The Dots: Each scenario (100 shots, 10 makes; 200 shots, 40 makes) is like a dot on a piece of graph paper.
- The Line: A linear model tries to draw the best straight line that goes through those dots.
- The Prediction: Now, you can find 250 on the "shots" part of the graph, follow it up to the line, and see where it matches up on the "3-pointers" part. That's your guess!
The Math Behind It:
The equation ŷ(w, x) = w₀ + w₁x₁ + ... + wₚxₚ looks complicated, but it's just a way of writing down the recipe for that prediction line.
- ŷ (y-hat): This is our guess for how many 3-pointers.
- w₀: This is like the starting point of the line.
- w₁x₁: This part shows how the number of shots (x₁) changes the number of 3-pointers. The "w₁" tells us how steep the line should be.
- ...: + wₚxₚ: This just means we can add more things to our recipe if we want to consider other factors that might affect Curry's 3-pointers.
To find the best-fit line, we'll use OLS to determine the values of w₀ (the y-intercept) and w₁ (the slope) in our linear regression model:
ŷ = w₀ + w₁x
We can use the following formulas (derived from OLS) to calculate these coefficients:
w₁ = Σ((x - x̄)(y - ȳ)) / Σ(x - x̄)²
w₀ = ȳ - w₁ * x̄
Where:
x̄is the mean ofx(shots taken)ȳis the mean ofy(3-pointers made)
Calculating the means and applying the formulas, we get:
x̄ = 10.7
ȳ = 9.6
w₁ ≈ 0.8286
w₀ ≈ 0.8143
So our best-fit line equation is:
ŷ = 0.8143 + 0.8286x3. Calculating the Residuals and SSE:
Now, let's calculate the predicted values (ŷ) for each game using our equation, then find the residuals (y - ŷ) and their squares:
Finally, we add up all the squared errors to find the Sum of Squared Errors (SSE):
This provide me with the accuracy of 15.76%. Now the questions can arise I am getting such a low score.
- We already know OSL is a regression technique and is used to estimate the relationship between the dependent variables and one or more independent variables.
- Whereas, Classifications problems involve predicting categorical outcomes(e.g. "spam" or "not spam"), which do not have the linear relationship with the input Features. In these case, the assumptions of OLS are violated. for example: the probability of an email being spam might not increase linearly with the frequency of certain keyword.
- OLS regression can produce continuous output value, which are not suitable for categorical outcomes. For example, a regression model might predict a value of 0.7 for spam, but it does not mean anything in binary classification
- Linear Relationship
- Continuous Target Variable: House Prices, Stok Prices, Temperature, Product sales etc.
- Normal Distribution
Logistic Regression
- Linear Equation
Just like linear regression, logistic regression starts with a linear equation:
z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ β₀ is the intercept
β₁, β₂, ..., βₙ are the coefficients for each feature.
x₁, x₂, ..., xₙ are the feature values of the input.- Logistics Functions( Sigmoid Function)
- Decision Boundary
p is above the threshold, the input is classified as the positive class; otherwise, it's classified as the negative class. The decision boundary is the line (or hyperplane) that separates the input space into regions where the model predicts one class or the other.Training the Model:
Finding the Best Coefficients (β): The training process involves finding the optimal values of the coefficients (
β₀,β₁, ...) that maximize the likelihood of the observed data (the training set). This is usually done using an optimization algorithm like gradient descent.Likelihood: The likelihood function measures how well the model's parameters explain the observed data. In logistic regression, it's the product of the probabilities of each observation being correctly classified by the model.
Maximum Likelihood Estimation (MLE): The goal is to find the values of the coefficients that maximize the likelihood function. This gives us the model that best explains the observed data.
Example: Disaster Tweet Classification
Let's say a simplified model uses two features:
x₁: Frequency of the word "help" in the tweet.x₂: Frequency of the word "earthquake" in the tweet.
The trained model might have coefficients:
β₀= -3β₁= 0.8β₂= 1.5
For a tweet with x₁ = 2 and x₂ = 3, the model would calculate:
z= -3 + (0.8 * 2) + (1.5 * 3) = 2.1p= 1 / (1 + e⁻².¹) ≈ 0.89
Since p is greater than 0.5, the model would classify this tweet as a "disaster" tweet.
Code:
from sklearn.linear_model import LogisticRegression
# Model Initializing and Fitting
LR_Model = LogisticRegression(max_iter=100000)
LR_Model.fit(tf_train,y_train)
#prediction
predict_LR = LR_Model.predict(tf_test)
#evaluation
score_lR = accuracy_score(y_test,predict_LR)
print(f"Accuracy{round(score_lR*100,2)}%")This give me the accuracy of 83.06%Passive Aggressive Classifier

.png)
Comments
Post a Comment