max_depth is an interesting parameter. While n_estimators has a tradeoff between speed & score, max_depth has the possibility of improving both. By limiting the depth of your trees, you can reduce overfitting.
Unfortunately, deciding on upper & lower bounds is less than straightforward. It'll depend on your dataset. Luckily, I found a post on StackOverflow that had a link to a blog post that had a promising methodology.
First, we build a tree with default arguments and fit it to our data.
Now, let's see how deep the trees get when we don't impose any sort of max_depth. We'll use the code from that wonderful blog post to crawl our Random Forest, and get the height of every tree.
Here's the output:
We'll be searching between 2 and 9!
Let's bring back our old make a helper function to easily return scores.
Now let's see it in action:
Here's what we've got:
So, for our purposes, 9 will function as our baseline since that was the biggest depth that it built with default arguments.
Looks like a max_depth of 2 has a slightly higher score than 9, and is slightly faster! Interestingly, it's slightly slower than 4 or 6. Not sure why that is.