I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.
I run my command like this:
tar -cf - somefolder | ./my-go-binary
The source code is like this:
package main
import (
"bufio"
"io"
"log"
"os"
)
// Read from standard input
func main() {
reader := bufio.NewReader(os.Stdin)
// Read all data from stdin, processing subsequent reads as chunks.
parts := 0
for {
parts++
data := make([]byte, 4<<20) // Read 4MB at a time
_, err := reader.Read(data)
if err == io.EOF {
break
} else if err != nil {
log.Fatalf("Problems reading from input: %s", err)
}
}
log.Printf("Total parts processed: %d\n", parts)
}
For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte
array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.
Is there something I need to do to read data from os.Stdin
properly with Go?
Read the documentation for Read:
You are not reading 4MB at a time. You are providing buffer space and discarding the integer that would have told you how much the Read actually read. The buffer space is the maximum, but most usually 128k seems to get read per call, at least on my system. Try it out yourself:
You have to implement the logic for handling the varying read amounts.
Your code is inefficient. It's allocating and initializing
data
each time through the loop.The code for your
reader
as anio.Reader
is wrong. For example, you ignore the number of bytes read by_, err := reader.Read(data)
and you don't handleerr
errors properly.Here's a model file read program that conforms to the
io.Reader
interface:Output: