Strip or Regex function in Spark 1.3 Dataframe

2019-03-02 21:53发布

站内文章 / Spark

34 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have some code from PySpark 1.5 that I unfortunately have to port backwards to Spark 1.3. I have a column with elements that are alpha-numeric but I only want the digits. An example of the elements in 'old_col' of 'df' are:

 '125 Bytes'

In Spark 1.5 I was able to use

df.withColumn('new_col',F.regexp_replace('old_col','(\D+)','').cast("long"))

However, I cannot seem to come up with a solution using old 1.3 methods like SUBSTR or RLIKE. Reason being the number of digits in front of "Bytes" will vary in length, so what I really need is the 'replace' or 'strip' functionality I can't find in Spark 1.3 Any suggestions?

回答1:

As long as you use HiveContext you can execute corresponding Hive UDFs either with selectExpr:

df.selectExpr("regexp_extract(old_col,'([0-9]+)', 1)")

or with plain SQL:

df.registerTempTable("df")
sqlContext.sql("SELECT regexp_extract(old_col,'([0-9]+)', 1) FROM df")

标签： regex apache-spark dataframe pyspark apache-spark-sql

叛逆

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

Strip or Regex function in Spark 1.3 Dataframe

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮