I'm trying to build a python script that imports json files into a MongoDB. This part of my script keeps jumping to the except ValueError
for larger json files. I think it has something to do with parsing the json file line by line because very small json files seem to work.
def read(jsonFiles):
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client[args.db]
counter = 0
for jsonFile in jsonFiles:
with open(jsonFile, 'r') as f:
for line in f:
# load valid lines (should probably use rstrip)
if len(line) < 10: continue
try:
db[args.collection].insert(json.loads(line))
counter += 1
except pymongo.errors.DuplicateKeyError as dke:
if args.verbose:
print "Duplicate Key Error: ", dke
except ValueError as e:
if args.verbose:
print "Value Error: ", e
# friendly log message
if 0 == counter % 100 and 0 != counter and args.verbose: print "loaded line:", counter
if counter >= args.max:
break
I'm getting the following error message:
Value Error: Extra data: line 1 column 10 - line 2 column 1 (char 9 - 20)
Value Error: Extra data: line 1 column 8 - line 2 column 1 (char 7 - 18)
Figured it out. Looks like breaking it up into lines was the mistake. Here's what the final code looks like.
Look at this example:
It will produce the "Extra data" error like in your json file:
This is because this is not a valid JSON object. It contains two independend "dict"s, separated by a colon. Perhaps this could help you finding the error in your JSON file.
in this post you find more information.