I'm using cdh5 quickstart vm and I have a file like this(not full here):
{"user_id": "kim95",
"type": "Book",
"title": "Modern Database Systems: The Object Model, Interoperability, and
Beyond.",
"year": "1995",
"publisher": "ACM Press and Addison-Wesley",
"authors": {},
"source": "DBLP"
}
{"user_id": "marshallo79",
"type": "Book",
"title": "Inequalities: Theory of Majorization and Its Application.",
"year": "1979",
"publisher": "Academic Press",
"authors": {("Albert W. Marshall"), ("Ingram Olkin")},
"source": "DBLP"
}
and I used this script:
books = load 'data/book-seded.json'
using JsonLoader('t1:tuple(user_id:
chararray,type:chararray,title:chararray,year:chararray,publisher:chararray,source:chararray,authors:bag{T:tuple(author:chararray)})');
STORE books INTO 'book-no-seded.tsv';
the script works , but the generated file is empty, do you have any idea?
Finally , only this schema worked : If I add or remove a space different from this configuration then i gonna have an error( i also added "name" for tuples and specified "null" when it was empty, and changed the order between authors and source, but even without this congiguration it will still be wrong)
And the working script is this one :
You need to bu sure that the LOAD schema is good. You can try to do a
DUMP books
to quick check.We had to be careful with the input data and the schema when we used the Pig JsonLoader for this tutorial http://gethue.com/hadoop-tutorials-ii-1-prepare-the-data-for-analysis/.
try STORE books INTO 'book-no-seded.tsv' using USING org.apache.pig.piggybank.storage.JsonStorage();