Spark Sampling - How much faster is it than using

I'm wondering what the runtime of Spark is when sampling a RDD/DF compared with the runtime of the full RDD/DF. I don't know if it makes a difference but I'm currently using Java + Spark 1.5.1 + Hadoop 2.6.

JavaRDD<Row> rdd = sc.textFile(HdfsDirectoryPath()).map(new Function<String, Row>() {
        @Override
        public Row call(String line) throws Exception {
            String[] fields = line.split(usedSeparator);
            GenericRowWithSchema row = new GenericRowWithSchema(fields, schema);//Assum that the schema has 4 integer columns
            return row;
            }
        });

DataFrame df   = sqlContext.createDataFrame(rdd, schema);
df.registerTempTable("df");
DataFrame selectdf   =  sqlContext.sql("Select * from df");
Row[] res = selectdf.collect();

DataFrame sampleddf  = sqlContext.createDataFrame(rdd, schema).sample(false, 0.1);// 10% of the original DS
sampleddf.registerTempTable("sampledf");
DataFrame selecteSampledf = sqlContext.sql("Select * from sampledf");
res = selecteSampledf.collect();

I would expect that the sampling is optimally close to ~90% faster. But for me it looks like that spark goes through the whole DF or does a count, which basically takes nearly the same time as for the full DF select. After the sample is generated, it executes the select.

Am I correct with this assumptions or is the sampling used in a wrong way what causes me to end up with the same required runtime for both selects?

I would expect that the sampling is optimally close to ~90% faster.

Well, there are a few reasons why these expectations are unrealistic:

without any previous assumptions about data distribution, to obtain an uniform sample, you have to perform a full dataset scan. This is more or less what happens when you use sample or takeSample methods in Spark
SELECT * is a relatively lightweight operation. Depending on the amount of resources you have time to process a single partition can be negligible
sampling doesn't reduce number of partitions. If you don't coalesce or repartition you can end up with a large number of almost empty partitions. It means suboptimal resource usage.
while RNGs are usually quite efficient generating random numbers is not free

There are at least two important benefits of sampling: