I'm trying to get the min, max mean of some Cassandra/SPARK data but I need to do it with JAVA.
import org.apache.spark.sql.DataFrame;
import static org.apache.spark.sql.functions.*;
DataFrame df = sqlContext.read()
.format("org.apache.spark.sql.cassandra")
.option("table", "someTable")
.option("keyspace", "someKeyspace")
.load();
df.groupBy(col("keyColumn"))
.agg(min("valueColumn"), max("valueColumn"), avg("valueColumn"))
.show();
EDITED to show working version:
Make sure to put " around the someTable and someKeyspace
Just import your data as a DataFrame
and apply required aggregations:
import org.apache.spark.sql.DataFrame;
import static org.apache.spark.sql.functions.*;
DataFrame df = sqlContext.read()
.format("org.apache.spark.sql.cassandra")
.option("table", someTable)
.option("keyspace", someKeyspace)
.load();
df.groupBy(col("keyColumn"))
.agg(min("valueColumn"), max("valueColumn"), avg("valueColumn"))
.show();
where someTable
and someKeyspace
store table name and keyspace respectively.
I suggest checking out https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector-demos
Which contains demos in both Scala and the equivalent Java.
You can also check out: http://spark.apache.org/documentation.html
Which has tons of examples that you can flip between Scala, Java, and Python versions.
I'm almost 100% certain that between those to links, you'll find exactly what you're looking for.
If there's anything you're having trouble with after that, feel free to update your question with a more specific error/problem.
In general,
compile scala file:
$ scalac Main.scala
create your java source file from Main.class file:
$ javap Main
More info is available at following url:
http://alvinalexander.com/scala/scala-class-to-decompiled-java-source-code-classes