-->

JSON path parent object, or equivalent MongoDB que

2020-03-26 05:12发布

问题:

I am selecting nodes in a JSON input but can't find a way to include parent object detail for each array entry that I am querying. I am using pentaho data integration to query the data using JSON input form a mongodb input.

I have also tried to create a mongodb query to achieve the same but cannot seem to do this either.

Here are the two fields/paths that display the data:

$.size_break_costs[*].size $.size_break_costs[*].quantity

Here is the json source format:

{
"_id" : ObjectId("4f1f74ecde074f383a00000f"),
"colour" : "RAVEN-SMOKE",
"name" : "Authority",
"size_break_costs" : [
    {
        "quantity" : NumberLong("80"),
        "_id" : ObjectId("518ffc0697eee36ff3000002"),
        "size" : "S"
    },
    {
        "quantity" : NumberLong("14"),
        "_id" : ObjectId("518ffc0697eee36ff3000003"),
        "size" : "M"
    },
    {
        "quantity" : NumberLong("55"),
        "_id" : ObjectId("518ffc0697eee36ff3000004"),
        "size" : "L"
    }
],
"sku" : "SK3579"
}

I currently get the following results:

    S,80 
    M,14 
    L,55

I would like to get the SKU and Name as well as my source will have multiple products (SKU/Description):

    SK3579,Authority,S,80
    SK3579,Authority,M,14
    SK3579,Authority,L,55

When I try To include using $.sku, I the process errors.

The end result i'm after is a report of all products and the available quantities of their various sizes. Possibly there's an alternative mongodb query that provides this.

EDIT:

It seems the issue may be due to the fact that not all lines have the same structure. For example the above contains 3 sizes - S,M,L. Some products come in one size - PACK. Other come in multiple sizes - 28,30,32,33,34,36,38 etc.

The error produced is:

*The data structure is not the same inside the resource! We found 1 values for json path [$.sku], which is different that the number retourned for path [$.size_break_costs[].quantity] (7 values). We MUST have the same number of values for all paths.

I have tried the following mongodb query separately which gives the correct results, but the corresponding export of this doesn't work. No values are returned for the Size and Quantity.

Query:

db.product_details.find( {}, {sku: true, "size_break_costs.size": true, "size_break_costs.quantity": true}).pretty();

Export:

mongoexport --db brandscope_production --collection product_details --csv --out Test01.csv --fields sku,"size_break_costs.size","size_break_costs.quantity" --query '{}';

回答1:

Shortly after I added my own bounty, I figured out the solution. My problem has the same basic structure, which is a parent identifier, and some number N child key/value pairs for ratings (quality, value, etc...).

First, you'll need a JSON Input step that gets the SKU, Name, and size_break_costs array, all as Strings. The important part is that size_break_costs is a String, and is basically just a stringified JSON array. Make sure that under the Content tab of the JSON Input, that "Ignore missing path" is checked, in case you get one with an empty array or the field is missing for some reason.

For your fields, use:

Name           | Path               | Type
ProductSKU     | $.sku              | String
ProductName    | $.name             | String
SizeBreakCosts | $.size_break_costs | String

I added a "Filter rows" block after this step, with the condition "SizeBreakCosts IS NOT NULL", which is then passed to a second JSON Input block. This second JSON block, you'll need to check "Source is defined in a field?", and set the value of "Get source from field" to "SizeBreakCosts", or whatever you named it in the first JSON Input block.

Again, make sure "Ignore missing path" is checked, as well as "Ignore empty file". From this block, we'll want to get two fields. We'll already have ProductSKU and ProductName with each row that's passed in, and this second JSON Input step will further split it into however many rows are in the SizeBreakCosts input JSON. For fields, use:

Name     | Path           | Type
Quantity | $.[*].quantity | Integer
Size     | $.[*].size     | String

As you can see, these paths use "$.[*].FieldName", because the JSON string we passed in has an array as the root item, so we're getting every item in that array, and parsing out its quantity and size.

Now every row should have the SKU and name from the parent object, and the quantity and size from each child object. Dumping this example to a text file, I got:

ProductSKU;ProductName;Size;Quantity
SK3579;Authority;S; 80
SK3579;Authority;M; 14
SK3579;Authority;L; 55