Both Dataprep and Dataflow can be used for ETL tasks. In fact Dataprep seems to use Dataflow jobs. Is it that the only difference that Dataprep provides tools to write dataflow jobs with a user interface ?
问题:
回答1:
Both dataflow and dataprep can transform data for sure. The main difference is who is using the technology. Does your project need self-service data transformation by data users such as data engineers or business users such as analysts and data scientists? Then dataprep is the choice. This is no coding. Ultimately it generates dataflow jobs. Cloud dataprep offers advanced transformations such as pivoting, unpivoting, aggregations, time series, joins, unions, standardization, and hundreds of other data functions exposed with an intuitive visual interface. Data needs to be in CDS or BigQuery though.
回答2:
Dataprep is a tool for performing ETL on file sources through a UI. Convenient, but relatively limited. Dataflow is a managed service for deploying ETL pipelines written using the apache beam programming model, useful for both batch and streaming data, and can potentially be used with whatever data sources you want (e.g. Kafka, pubsub, datastore, JDBC...). Dataprep is more limited to GCS and BigQuery.