I am very new to Json files. If I have a json file with multiple json objects such as following:
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
"Code":[{"event1":"A","result":"1"},…]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
"Code":[{"event1":"B","result":"1"},…]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
"Code":[{"event1":"B","result":"0"},…]}
…
I want to extract all "Timestamp" and "Usefulness" into a data frames:
Timestamp Usefulness
0 20140101 Yes
1 20140102 No
2 20140103 No
…
Does anyone know a general way to deal with such problems? Thanks!
As you parse through the objects, you are dealing with dictionaries. You can extract the values you need by searching via key. E.g.
value = jsonDictionary['Usefulness']
.You can loop through the
json
objects by using a for loop. e.g.:So, as was mentioned in a couple comments containing the data in an array is simpler but the solution does not scale well in terms of efficiency as the data set size increases. You really should only use an iterator when you want to access a random object in the array, otherwise, generators are the way to go. Below I have prototyped a reader function which reads each json object individually and returns a generator.
The basic idea is to signal the reader to split on the carriage character "\n" (or "\r\n" for Windows). Python can do this with the file.readline() function.
However, this method only really works when the file is written as you have it -- with each object separated by a new line character. Below I wrote an example of a writer that separates an array of json objects and saves each one on a new line.
You could also do the same operation with file.writelines() and list comprehension
And if you wanted to append the data instead of writing a new file just change ' mode="w" ' to ' mode="a" '.
In the end I find this helps a great deal not only with readability when I try and open json files in text editor but also in terms of using memory more efficiently.
On that note if you change you mind at some point and you want a list out of the reader, Python allows you to put a generator function inside of a list and populate the list automatically. In other words, just write
Hope this helps. Sorry if it was a bit verbose.
You can use
json.JSONDecoder.raw_decode
to decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory).raw_decode
stops once it has a valid object and returns the last position where wasn't part of the parsed object. It's not documented, but you can pass this position back toraw_decode
and it start parsing again from that position. Unfortunately, the Pythonjson
module doesn't accept strings that have prefixing whitespace. So we need to search to find the first none-whitespace part of your document.prints:
Use a json array, in the format:
Then import it into your python code
Now the content of data is an array with dictionaries representing each of the elements.
You can access it easily, i.e: