We would like to use an AWS-Glue Job to filter JSON messages within an s3 bucket.
Here is some example JSON:
{ "property": {"subproperty1": "A", "subproperty2": "B" }}
{ "property": {"subproperty1": "C", "subproperty2": "D" }}
We want to filter on subproperty1 in ["A", "B"]
. This is what we try:
applyFilter1 = Filter.apply(
frame = datasource0,
f = lambda x: x["property.subproperty1"] in ["A", "B"]
)
Output is then written so a new s3 bucket as follows:
datasink2 = glueContext.write_dynamic_frame.from_options(
frame = applyFilter1,
connection_type = "s3",
connection_options = {"path": "s3://<my-s3-location>"},
format = "json",
transformation_ctx = "datasink2"
)
Unfortunately the resulting file is empty. Any idea? Is filtering nested expressions like this supported in AWS Glue?