I had one dict
, like:
cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}
and one DataFrame A
, like:
+---+
|key|
+----
| k1|
| k2|
| k3|
| k4|
+---+
to create the DataFame above with code:
data = [('k1'),
('k2'),
('k3'),
('k4')]
A = spark.createDataFrame(data, ['key'])
I want to get the new DataFrame, like:
+---+----------+----------+
|key| v1 | v2 |
+---+----------+----------+
| k1|true |false |
| k2|true |false |
| k3|false |true |
| k4|false |true |
+---+----------+----------+
I wish to get some suggestions, thanks!
I just wanted to add an easy way to create DF, using pyspark
I parallelize
cMap.items()
and check if value equal tov1
orv2
or not. Then joining back to dataframe A on columnkey
Dataframe
I just wanted to contribute a different and possibly easier way to solve this.
In my code I convert a dict to a pandas dataframe, which I find is much easier. Then I directly convert the pandas dataframe to spark.
Thanks everyone for some suggestions, I figured out the other way to resolve my problem with pivot, the code is:
But, I can't convert 1 to true, 0 to false.
The dictionary can be converted to dataframe and joined with other one. My piece of code,
If there are more values, you can code that when clause as a UDF and use it.