Day 3
Lets start with Random Forest
1. It combines the output of multiple decision tree to reach the single result.
2. It handles both regression and classification problems so we wont be having problems we encountered on Ordinary Square Method.
3. it is made of many decision tree but I am yet to learn decision tree.
Lets move back and learn decision tree first.
1. Similar to Random forest as it can handle both regression and classification.
Lets drive into some math before we start:
1. Entropy (Information Gain):
Measure's the impurity or disorder of set of data. High entropy means the data is more mixed up (e.g., equal numbers of different classes), while low entropy means it's more pure (mostly one class).
2. Information Gain
it is a decrease in entropy achieved by splitting the data on particular attribute. One of the main attribute of decision tree is that it gives highest information gain, as this leads to most information splits.
Formula for Entropy:
Entropy(S) = - Σ (p_i * log2(p_i))where:
- S is the set of examples
- p_i is the proportion of examples in S that belong to class i
Formula for Information Gain:
InformationGain(S, A) = Entropy(S) - Σ ((|S_v| / |S|) * Entropy(S_v))
where:
- S is the set of examples
- A is the attribute we're considering splitting on
- S_v is the subset of examples in S that have value v for attribute A
2. Gini Impurity
lower the Gini means more pure node
Formula for Gini Impurity:
Gini(S) = 1 - Σ (p_i)^2
3. Attribute Selection
For every node, decision tree information gain or gini impurity to choose the best attributeto split on at each node. The goal is the maximize the purity of the resulting child node4. Recursive Partitioning
Decision trees build their structure by recursively splitting the data based on the chosen
attributes. This process continues until a stopping criterion is met (e.g., maximum depth,
minimum number of samples per leaf).
Comments
Post a Comment