One of the best features of Random Forests is that it has built-in Feature Selection. Explicability is one of the things we often lose when we go from traditional statistics to Machine Learning, but Random Forests lets us actually get some insight into our dataset instead of just having to treat our model as a
Tune the min_samples_leaf parameter in for a Random Forests classifier in scikit-learn in Python
Another parameter, another set of quirks!
min_samples_leaf is sort of similar to
max_depth. It helps us avoid overfitting. It's also non-obvious what you should use as your upper and lower limits to search between. Let's do what we did last week - build a forest
Code snippet corner is back! Tune the max_depth parameter in for a Random Forests classifier in scikit-learn in Python
Continued from here
Notebook for this post is here
Binary search code itself is here
max_depth is an interesting parameter. While
n_estimators has a tradeoff between speed & score,
max_depth has the possibility of improving both. By limiting the depth of your trees, you can reduce overfitting.
Unfortunately, deciding on upper & lower bounds is less than
Tune the n_estimators parameter in for a Random Forests classifier in scikit-learn in Python
Ah, hyperparameter tuning. Time & compute-intensive. Frequently containing weird non-linearities in how changing a parameter changes the score and/or the time it takes to train the model.
RandomizedSearchCV goes noticeably faster than a full
GridSearchCV but it still takes a while - which can be rough, because in my experience you do still need to be iterative with it