How access individual element in a tuple on a RDD

2019-04-12 02:26发布

Lets say I have a RDD like

[(u'Some1', (u'ABC', 9989)), (u'Some2', (u'XYZ', 235)), (u'Some3', (u'BBB', 5379)), (u'Some4', (u'ABC', 5379))]

I am using map to get one tuple at a time but how can I access to individual element of a tuple like to see if a tuple contains some character. Actually I want to filter out those that contains some character. Here the tuples that contain ABC

I was trying to do something like this but its not helping

def foo(line):
     if(line[1]=="ABC"):
          return (line)


new_data = data.map(foo)

I am new to spark and python as well please help!!

标签： python apache-spark pyspark rdd

1条回答

等我变得足够好

2楼-- · 2019-04-12 03:13

RDDs can be filtered directly. Below will give you all records that contain "ABC" in the 0th position of the 2nd element of the tuple.

new_data = data.filter(lambda x: x[1][0] == "ABC")

0人赞添加讨论(0) 举报

How access individual element in a tuple on a RDD

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间