I am a new-grad SWE learning Go (and loving it).
I am building a parser for Wikipedia dump files - basically a huge bzip2-compressed XML file (~50GB uncompressed).
I want to do both streaming decompression and parsing, which sounds simple enough. For decompression, I do:
inputFilePath := flag.Arg(0)
inputReader := bzip2.NewReader(inputFile)
And then pass the reader to the XML parser:
decoder := xml.NewDecoder(inputFile)
However, since both decompressing and parsing are expensive operations, I would like to have them run on separate Go routines to make use of additional cores. How would I go about doing this in Go?
The only thing I can think of is wrapping the file in a chan []byte, and implementing the io.Reader interface, but I presume there might be a built way (and cleaner) way of doing it.
Has anyone ever done something like this?
Thanks! Manuel
You can use io.Pipe, then use io.Copy to push the decompressed data into the pipe, and read it in another goroutine:
http://play.golang.org/p/fXLnfnaWYA