Posts

Showing posts from August, 2024

Learning to hold breath day 1

 Let's start with my normal time before applying any new technique. For now, the maximum time is 20s, terrible right?
I am starting a new course Course objective In this course, we are going to focus on the following learning objectives: 1. Understand the theory and intuition behind Multiple Linear Regression. 2. Import Key python libraries, dataset and perform data visualization 3. Perform exploratory data analysis and standardize the training and testing data. 4. Train and Evaluate linear regression model using Sci-kit Learn library. 5. Understand the difference between various regression models KPIs such as MSE, RMSE, MAE, R2, and adjusted R2. 6. Assess the performance of regression models and visualize the performance of the best model using various KPIs. 7. Understand the distribution and relationship of data Project Structure The hands on project on Life Expectancy Prediction Using Machine Learning is divided into following tasks: Task #1: Understand the Problem Statement and Business Case Task #2: Import Datasets and Libraries Task #3: Perform Data Visualization and Exploratory Data Analysis T...

Data cleaning

Data cleaning (also known as data preprocessing or data wrangling) is a critical step in data analysis and machine learning. The quality of your data has a direct impact on the quality of your analysis or model performance. Here’s a comprehensive list of techniques you need to learn for effective data cleaning: 1. Handling Missing Data Identify Missing Data : Understand how to detect missing values ( NaN , None , Null ). Imputation : Mean/Median/Mode Imputation : Replace missing values with the mean, median, or mode of the column. Forward/Backward Fill : Fill missing values with the previous/next value in the column. Interpolation : Use methods like linear interpolation to fill in missing values. K-Nearest Neighbors (KNN) Imputation : Estimate missing values based on similar observations. Dropping Missing Values : Remove rows or columns with missing data if they represent too much noise. 2. Handling Outliers Detecting Outliers : Statistical Methods : Use Z-scores, IQR (Interquartile Ra...