Reading video during cloud dataflow, using GCSfuse

2019-08-22 11:45发布

I am building a python cloud video pipeline that will read video from a bucket, perform some computer vision analysis and return frames back to a bucket. As far as I can tell, there is not a Beam read method to pass GCS paths to opencv, similar to TextIO.read(). My options moving forward seem to download the file locally (they are large), use GCS fuse to mount on a local worker (possible?) or write a custom source method. Anyone have experience on what makes most sense?

My main confusion was this question here

Can google cloud dataflow (apache beam) use ffmpeg to process video or image data

How would ffmpeg have access to the path? Its not just a question of uploading the binary? There needs to be a Beam method to pass the item, correct?

标签： opencv video google-cloud-dataflow

1条回答

小情绪 Triste *

2楼-- · 2019-08-22 12:17

I think that you will need to download the files first and then pass them through.

However instead of saving the files locally, is it possible to pass bytes through to opencv. Does it accept any sort of ByteStream or input stream?

You could have one ParDo which downloads the files using the GCS API, then passes it to a opencv through a stream, ByteChannel stdin pipe, etc.

If that is not available, you will need to save the files to disk locally. Then pass opencv the filename. This could be tricky because you may end up using too much disk space. So make sure to garbage collect the files properly and delete the files from local disk after opencv processes them.

I'm not sure but you may need to also select a certain VM machine type to ensure you have enough disk space, depending on the size of your files.

0人赞添加讨论(0) 举报

Reading video during cloud dataflow, using GCSfuse

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间