Apache Spark mapPartitionsWithIndex

2019-05-07 00:09发布

Can someone give example of correct usage of mapPartitionsWithIndex in Java? I've found a lot of Scala examples, but there is lack of Java ones. Is my understanding correct that separate partitions will be handled by separate nodes when using this function.

I am getting the following error

method mapPartitionsWithIndex in class JavaRDD<T> cannot be applied to given types;
    JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex
    required: Function2<Integer,Iterator<String>,Iterator<R>>,boolean
    found: <anonymous Function2<Integer,Iterator<String>,Iterator<JavaRDD<String>>>>

When doing

JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex(
    new Function2<Integer, Iterator<String>, Iterator<JavaRDD<String>> >() {

    @Override
    public Iterator<JavaRDD<String>> call(Integer ind, String s) {

标签： java mapreduce apache-spark

1条回答

做自己的国王

2楼-- · 2019-05-07 00:31

Here is the code I use to remove the first line of a csv file:

JavaRDD<String> rawInputRdd = sparkContext.textFile(dataFile);

Function2 removeHeader= new Function2<Integer, Iterator<String>, Iterator<String>>(){
    @Override
    public Iterator<String> call(Integer ind, Iterator<String> iterator) throws Exception {
        if(ind==0 && iterator.hasNext()){
            iterator.next();
            return iterator;
        }else
            return iterator;
    }
};
JavaRDD<String> inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, false);

0人赞添加讨论(0) 举报

Apache Spark mapPartitionsWithIndex

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间