reading a csv file from azure blob storage with Py

2019-07-20 23:50发布

I'm trying to do a machine learning project using a PySpark HDInsight cluster on Microsoft Azure. To operate on my cluster a use a Jupyter notebook. Also, I have my data (a csv file), stored on the Azure Blob storage.

According to the documentation the syntax of the path to my file is:

path = 'wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'

However, when i try to read the csv file with the following command:

csvFile = spark.read.csv(path, header=True, inferSchema=True)

I get the following error:

'java.net.URISyntaxException: Illegal character in scheme name at index 4: wasb[s]://springboard@6zpbt6muaorgs.blob.core.windows.net/movies_plus_genre_info_2.csv'

Here is a screenshot of the the error looks like in the notebook: error screenshot

Any ideas on how to fix this?

标签： azure apache-spark pyspark azure-storage hdinsight

1条回答

Fickle 薄情

2楼-- · 2019-07-21 00:35

It is either (unencrypted):

wasb://...

or (encrypted):

wasbs://...

not

wasb[s]://...

0人赞添加讨论(0) 举报

reading a csv file from azure blob storage with Py

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间