Gradient Boosted Models
— data-science, ML
Snippet
What is Boosting?
- Type of Ensemble Models. For a graphical view, check out Tree-Based Methods
- Basically, you start from a base stump (single Y/N boundary/question)
- A boosted ensemble is the first tree + learning_rate * second_tree
- Use a Loss Function (for XGBoost, Loss function is popular)
- By the way, XGBoost is just a library implemented of a gradient boosted model.
Pros and Cons
- Pros
- Powerful and accurate, more so often than random forest
- Good at handling complex, non-linear relationships
- Good at dealing with imbalanced data
- Cons
- Slower to train, since trees must be built sequentially
- Prone to overfitting if data is noisy
- Harder to tune hyperparameters
Notes:
- Gradient boosting for linear regression doesn't work
- Boosting shines when there is no terse functional form around. Boosting decision trees lets the functional form of the regressor/classifier evolve slowly to fit the data, often resulting in complex shapes one could not have dreamed up by hand and eye. When a simple functional form is desired, boosting is not going to help you find it (or at least is probably a rather inefficient way to find it).
- Different kinds of models have different advantages. The boosted trees model is very good at handling tabular data with numerical features, or categorical features with fewer than hundreds of categories. Unlike linear models, the boosted trees model are able to capture non-linear interaction between the features and the target.
Example
import pandas as pdimport numpy as npimport sklearnfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_errorimport xgboost as xgbfrom xgboost import XGBClassifier, plot_treefrom sklearn.metrics import accuracy_scoreimport matplotlib.pyplot as pltfrom sklearn.tree import export_graphviz
mnist = pd.read_csv("sample_data/mnist_train_small.csv")x = mnist[mnist.columns.difference(["6"])]y = mnist["6"]x_train_val, x_test, y_train_val, y_test = train_test_split(x, y, test_size=0.2)x_train, x_val, y_train, y_val = train_test_split(x_train_val, y_train_val, test_size=0.2)
model = RandomForestRegressor(50, max_depth=15, max_features=15)model.fit(x_train, y_train)print(model.score(x_val, y_val))
model2 = XGBClassifier(objective='multiclass:softmax', learning_rate = 0.1,max_depth = 1, n_estimators = 330)model2.fit(x_train, y_train)preds = model2.predict(x_test)print(sum(preds==y_test)/len(y_test))[Image: boosting.pdf]
Gradient Boosted Tree-Based Methods:
- Each tree is typically a weak learner, meaning it performs relatively poorly on its own but contributes to the overall performance of the ensemble.
- The trees are usually shallow and consist of only a few splits.
- Sequentially built
- Bagging methods like Random Forest build each tree independently
- Boosting they are built sequentially, with each tree correcting mistakes of its predecessors.
- Each subsequent tree focuses on misclassification instances, giving more weight to those instances to try to classify correctly.
- Gradient Boosting
- Specific type of boosting where each tree fits on residual errors of previous trees, fitting negative gradient of loss function that is optimized.
- Generalizes boosting method by allowing optimization of an arbitrary differentiable loss function.
==Setting parameters like subsample ratios, column subsampling, regularization terms.==
Limitations of SHAPley scores