I am writing a decision tree and the following code is a part of the complete code:
def show_tree(tree, features, path):
f = io.StringIO()
export_graphviz(tree, out_file=f, feature_names=features)
pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)
img = misc.imread(path)
plt.rcParams['figure.figsize'] = (20,20)
plt.imshow(img)
Could any one please tell me what is the purpose of using StringIO here?
Python is not my leading language, however I think answer for your question is quite simple and does not require lot of research.
StringIO is used here to maintain Input/Output text stream. Your function show tree, however for doing that it needs a way to do it, some kind of data transport highway.
Here f = io.StringIO()
you're initializing your data stream. After that you are free to use it as you want, in this particular case:
export_graphviz(tree, out_file=f, feature_names=features)
Here: out_file=f
you export data to your stream using initialized before f = io.StringIO();
. As StringIO is in-memory text file, you basically put your data aside in stream object for further use. Thanks to that you don't have to write your data into .dot file, instead you temporary hold it.(And temporary means for as long as your stream is in use)
More about this particular case
pydotplus.graph_from_dot_data(f.getvalue()).write_png(path)
Here: f.getvalue()
you generate your graph from .dot data. In the most basic use you should ensure path to .dot file in which previously generated data would be stored, but YOU DON'T HAVE TO! That is the trick, your data is still in stream object which you created and filled beforehand! So now all you have to do is direct it straight to this function which will generate your graph image with that data and save it as .png file.
Communication between system files and your program can be established in many ways but usually you use streams. You initialize stream at the very beginning, use it and then close. Every std::cout
or std:err
(my main language reference, sorry for that non-python example) is that stream. Stream allows you to maintain data exchange between your program and designated tagret(e.g. console, or in this case file), however you can use it also as temporary storage space which in that particular case will speed up image generation process as you don't really have to write and load data into file. All you have to do thanks to that is writing data to stream in an order which other function will accept, and then use the very same stream to read that data for image generation purposes.
More about StringIO
StringIO represents an in-memory text file. It can be used exactly the same as any text file, so you can write / read from it. The access is faster than regular file because the stringio buffer is managed in memory, but in the other hand it is not persistent on disk.
In the example you're giving, you could also have used a regular text file.
This is an example with a regular dot text file:
def show_tree(tree, features, path):
f = 'tree.dot'
export_graphviz(tree, out_file=f, feature_names=features)
pydotplus.graph_from_dot_file(f).write_png(path)
img = misc.imread(path)
plt.rcParams['figure.figsize'] = (20,20)
plt.imshow(img)
And this is another example without file and without StringIO by just using the string content of the dot file exported by export_graphviz()
def show_tree(tree, features, path):
dot_data = export_graphviz(tree, out_file=None, feature_names=features)
pydotplus.graph_from_dot_data(dot_data).write_png(path)
img = misc.imread(path)
plt.rcParams['figure.figsize'] = (20,20)
plt.imshow(img)