Manual tree fitting memory consumption in sklearn

I'm using sklearn's RandomForestClassifier for a classification problem. I would like to train the trees of the a forest individually as I am grabbing subsets of a (VERY) large set for each tree. However, when I fit trees manually, memory consumption bloats.

Here's a line-by-line memory profile using memory_profiler of a custom fit vs using the RandomForestClassifier's fit function. As far as I can tell the source fit function performs the same steps as the custom fit. So what gives with all the extra memory??

normal fit:

Line #    Mem usage    Increment   Line Contents
================================================
17   28.004 MiB    0.000 MiB   @profile
18                             def normal_fit():
19   28.777 MiB    0.773 MiB    X = random.random((1000,100))
20   28.781 MiB    0.004 MiB    Y = random.random(1000) < 0.5
21   28.785 MiB    0.004 MiB    rfc = RFC(n_estimators=100,n_jobs=1)
22   28.785 MiB    0.000 MiB    rfc.n_classes_ = 2
23   28.785 MiB    0.000 MiB    rfc.classes_ = array([False, True],dtype=bool)
24   28.785 MiB    0.000 MiB    rfc.n_outputs_ = 1
25   28.785 MiB    0.000 MiB    rfc.n_features_ = 100
26   28.785 MiB    0.000 MiB    rfc.bootstrap = False
27   37.668 MiB    8.883 MiB    rfc.fit(X,Y)

custom fit:

Line #    Mem usage    Increment   Line Contents
================================================
 4   28.004 MiB    0.000 MiB   @profile
 5                             def custom_fit():
 6   28.777 MiB    0.773 MiB    X = random.random((1000,100))
 7   28.781 MiB    0.004 MiB    Y = random.random(1000) < 0.5
 8   28.785 MiB    0.004 MiB    rfc = RFC(n_estimators=100,n_jobs=1)
 9   28.785 MiB    0.000 MiB    rfc.n_classes_ = 2
10   28.785 MiB    0.000 MiB    rfc.classes_ = array([False, True],dtype=bool)
11   28.785 MiB    0.000 MiB    rfc.n_outputs_ = 1
12   28.785 MiB    0.000 MiB    rfc.n_features_ = 100
13   73.266 MiB   44.480 MiB    for i in range(rfc.n_estimators):
14   72.820 MiB   -0.445 MiB        rfc._make_estimator()
15   73.262 MiB    0.441 MiB        rfc.estimators_[-1].fit(X,Y,check_input=False)

Follow up:

I instead create a python script for building a single tree and dumping it via pickle. Then I glue everything together with some shell scripting and a final python script to create and dump the RF model. This way memory is returned after each tree creation as each has its own thread of execution.

The sklearn implementation gets around the memory issue in a way that I believe has to do with the the _parallel_build_tree method as the custom implementation only differs in that respect. I'm posting my workaround as answer, but if in the future, someone could enlighten me on the previous, I'd appreciate it.