Go - Decode JSON as it is still streaming in via n

2019-01-28 08:48发布

问题:

In the past I've used go to decode JSON from an API endpoint in the manner shown below.

client := &http.Client{}

req, err := http.NewRequest("GET", "https://some/api/endpoint", nil)
res, err := client.Do(req)
defer res.Body.Close()

buf, _ := ioutil.ReadAll(res.Body)

// ... Do some error checking etc ...

err = json.Unmarshal(buf, &response)

I am shortly going to be working on an endpoint that could send me several megabytes of JSON data in the following format.

{
    "somefield": "value",
    "items": [
        { LARGE OBJECT },
        { LARGE OBJECT },
        { LARGE OBJECT },
        { LARGE OBJECT },
        ...
    ]
}

The JSON will at some point contain an array of large, arbitrary length, objects. I want to take each one of these objects and place them, separately, into a message queue. I do not need to decode the objects themselves.

If I used my normal method, this would load the entire response into memory before decoding it.

Is there a good way to split out each of the LARGE OBJECT items as the response is still streaming in and dispatch it off to the queue? I'm doing this to avoid holding as much data in memory.

Thanks!

回答1:

Decoding a JSON stream is possible with the json.Decoder.

With Decoder.Decode(), we may read (unmarshal) a single value without consuming and unmarshaling the complete stream. This is cool, but your input is a "single" JSON object, not a series of JSON objects, which means a call to Decoder.Decode() would attempt to unmarshal the complete JSON object with all items (large objects).

What we want is partially, on-the-fly processing of a single JSON object. For this, we may use Decoder.Token() which parses (advances) only the next subsequent token in the JSON input stream and returns it. This is called event-driven parsing.

Of course we have to "process" (interpret and act upon) the tokens and build a "state machine" that keeps track of where we're in the JSON structure we're processing.

Here's an implementation that solves your problem.

We will use the following JSON input:

{
    "somefield": "value",
    "otherfield": "othervalue",
    "items": [
        { "id": "1", "data": "data1" },
        { "id": "2", "data": "data2" },
        { "id": "3", "data": "data3" },
        { "id": "4", "data": "data4" }
    ]
}

And read the items, the "large objects" modeled by this type:

type LargeObject struct {
    Id   string `json:"id"`
    Data string `json:"data"`
}

We will also parse and interpret other fields in the JSON object, but we will only log / print them.

For brevity and easy error handling, We'll use this helper error handler function:

he := func(err error) {
    if err != nil {
        log.Fatal(err)
    }
}

And now let's see some action. In the example below for brevity and to have a working demonstration on the Go Playground, we'll read from a string value. To read from an actual HTTP response body, we only have to change a single line, which is how we create the json.Decoder:

dec := json.NewDecoder(res.Body)

So the demonstration:

dec := json.NewDecoder(strings.NewReader(jsonStream))
// We expect an object
t, err := dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '{' {
    log.Fatal("Expected object")
}

// Read props
for dec.More() {
    t, err = dec.Token()
    he(err)
    prop := t.(string)
    if t != "items" {
        var v interface{}
        he(dec.Decode(&v))
        log.Printf("Property '%s' = %v", prop, v)
        continue
    }

    // It's the "items". We expect it to be an array
    t, err := dec.Token()
    he(err)
    if delim, ok := t.(json.Delim); !ok || delim != '[' {
        log.Fatal("Expected array")
    }
    // Read items (large objects)
    for dec.More() {
        // Read next item (large object)
        lo := LargeObject{}
        he(dec.Decode(&lo))
        fmt.Printf("Item: %+v\n", lo)
    }
    // Array closing delim
    t, err = dec.Token()
    he(err)
    if delim, ok := t.(json.Delim); !ok || delim != ']' {
        log.Fatal("Expected array closing")
    }
}

// Object closing delim
t, err = dec.Token()
he(err)
if delim, ok := t.(json.Delim); !ok || delim != '}' {
    log.Fatal("Expected object closing")
}

This will produce the following output:

2009/11/10 23:00:00 Property 'somefield' = value
2009/11/10 23:00:00 Property 'otherfield' = othervalue
Item: {Id:1 Data:data1}
Item: {Id:2 Data:data2}
Item: {Id:3 Data:data3}
Item: {Id:4 Data:data4}

Try the full, working example on the Go Playground.