Best way ho to validate ingested data

2019-08-18 03:43发布

I am ingesting data daily from various external sources like GA, scrapers, Google BQ, etc. I store created CSV file into HDFS, create stage table from it and then append it to historical table in Hadoop. Can you share some best practices how to valide new data with historical one? Like for example compare row count of actual data with average of last 10 days or someting like that. Is there any prepared solution in spark or something?

Thanks for advices.

0条回答
登录 后发表回答