how to implement spark sql pagination query

2019-02-21 13:13发布

问题:

Does anyone how to do pagination in spark sql query?

I need to use spark sql but don't know how to do pagination.

Tried:

select * from person limit 10, 10

回答1:

karthik's answer will fail if there are duplicate rows in the dataframe. 'except' will remove all rows in df1 which are in df2 .

val filteredRdd = df.rdd.zipWithIndex().collect { case (r, i) if 10 >= start && i <=20 => r }
val newDf = sqlContext.createDataFrame(filteredRdd, df.schema)


回答2:

There is no support for offset as of now in spark sql. One of the alternatives you can use for paging is through DataFrames using except method.

Example: You want to iterate with a paging limit of 10, you can do the following:

    DataFrame df1;
    long count = df.count();
    int limit = 10;
    while(count > 0){
        df1 = df.limit(limit);
        df1.show();            //will print 10, next 10, etc rows
        df = df.except(df1);
        count = count - limit;
    }

If you want to do say, LIMIT 50, 100 in the first go, you can do the following:

        df1 = df.limit(50);
        df2 = df.except(df1);
        df2.limit(100);       //required result

Hope this helps!