In the Row Java API there is a row.schema(), however there is not a row.set(StructType schema).
Also I tried to RowFactorie.create(objets), but I don't know how to proceed
UPDATE:
The problems is how to generate a new dataframe when I modify the structure in workers I put the example
DataFrame sentenceData = jsql.createDataFrame(jrdd, schema);
List<Row> resultRows2 = sentenceData.toJavaRDD()
.map(new MyFunction<Row, Row>(parameters) {
/** my map function **//
public Row call(Row row) {
// I want to change Row definition adding new columns
Row newRow = functionAddnewNewColumns (row);
StructType newSchema = functionGetNewSchema (row.schema);
// Here I want to insert the structure
//
return newRow
}
}
}).collect();
JavaRDD<Row> jrdd = jsc.parallelize(resultRows);
// Here is the problema I don't know how to get the new schema to create the new modified dataframe
DataFrame newDataframe = jsql.createDataFrame(jrdd, newSchema);
You can create a row with Schema by using:
This is a pretty old thread, but I just had a use case where I needed to generate data with Spark and quickly work with data on the row level and then build a new dataframe from the rows. Took me a bit to put it together so maybe it will help someone.
Here we're taking a "template" row, modifying some data, adding a new column with appropriate "row-level" schema and then using that new row and schema to create a new DF with appropriate "new schema", so going "bottom up" :) This is building on @Christian answer originally, so contributing a simplified snippet back.
You do not set a schema on a row - that makes no sense. You can, however, create a
DataFrame
(or pre-Spark 1.3 a JavaSchemaRDD) with a given schema using the sqlContext.The
dataframe
will have the schema, you have provided.For further information, please consult the documentation at http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
EDIT: According to updated question
Your can generate new rows in your
map
-function which will get you a newrdd
of typeJavaRDD<Row>
You then define the new schema
Create a new
DataFrame
from yourrowRDD
withnewSchema
as schema