Posts

I am starting a new course Course objective In this course, we are going to focus on the following learning objectives: 1. Understand the theory and intuition behind Multiple Linear Regression. 2. Import Key python libraries, dataset and perform data visualization 3. Perform exploratory data analysis and standardize the training and testing data. 4. Train and Evaluate linear regression model using Sci-kit Learn library. 5. Understand the difference between various regression models KPIs such as MSE, RMSE, MAE, R2, and adjusted R2. 6. Assess the performance of regression models and visualize the performance of the best model using various KPIs. 7. Understand the distribution and relationship of data Project Structure The hands on project on Life Expectancy Prediction Using Machine Learning is divided into following tasks: Task #1: Understand the Problem Statement and Business Case Task #2: Import Datasets and Libraries Task #3: Perform Data Visualization and Exploratory Data Analysis T...

Data cleaning

Data cleaning (also known as data preprocessing or data wrangling) is a critical step in data analysis and machine learning. The quality of your data has a direct impact on the quality of your analysis or model performance. Here’s a comprehensive list of techniques you need to learn for effective data cleaning: 1. Handling Missing Data Identify Missing Data : Understand how to detect missing values ( NaN , None , Null ). Imputation : Mean/Median/Mode Imputation : Replace missing values with the mean, median, or mode of the column. Forward/Backward Fill : Fill missing values with the previous/next value in the column. Interpolation : Use methods like linear interpolation to fill in missing values. K-Nearest Neighbors (KNN) Imputation : Estimate missing values based on similar observations. Dropping Missing Values : Remove rows or columns with missing data if they represent too much noise. 2. Handling Outliers Detecting Outliers : Statistical Methods : Use Z-scores, IQR (Interquartile Ra...

Day 3

 Lets start with Random Forest  1. It combines the output of multiple decision tree to reach the single result. 2. It handles both regression and classification problems so we wont be having problems we encountered on Ordinary Square Method. 3. it is made of many decision tree but I am yet to learn decision tree. Lets move back and learn decision tree first. 1. Similar to Random forest as it can handle both regression and classification.  Lets drive into some math before we start: 1. Entropy (Information Gain): Measure's the impurity or disorder of set of data. High entropy means the data is more mixed up (e.g., equal numbers of different classes), while low entropy means it's more pure (mostly one class). 2. Information Gain it is a decrease in entropy achieved by splitting the data on particular attribute. One of the main attribute of decision tree is that it gives highest information gain, as this leads to most information splits. Formula for Entropy: Entropy(S) = - Σ ...

Linear Models Battle

Image
Linear Models Battle: Who Wins Disaster Tweet Prediction? On day three of my ML journey, I'm exploring different linear models to see how they work and which ones are most accurate. Before I start coding, I want to understand the two main learning approaches in machine learning: Supervised Learning 1. 1 Linear model Let's start by understanding what a linear model is. It's like a special tool that helps us make guesses (or predictions) when one thing seems to be related to another. Imagine this: Your favorite basketball player, Steph Curry, is practicing his 3-pointers. You notice a pattern: Scenario 1: He takes 100 shots and makes 10 of them. Scenario 2: He takes 200 shots and makes 40 of them. Now, you want to guess how many 3-pointers he might make if he takes 250 shots. A linear model can help us with that! Think of it like this: The Dots: Each scenario (100 shots, 10 makes; 200 shots, 40 makes) is like a dot on a piece of graph paper. The Line: A linear model trie...