city temperature windspeed event
day
2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
The above shows the original data.
Below shows the result using np.array_split(data, 4).
day city temperature windspeed event
2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
day city temperature windspeed event
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
day city temperature windspeed event
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
day city temperature windspeed event
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
As you can see here, I'm trying to create 4 groups from the original data, making sure that each group has all the cities. however, by using array.split(), it split the data into 4 groups but it does not contain all the cities. I want each group to have Mumbai, Paris and New York. How can I do that?
Meaning to say, what I'm trying to achieve is something like below:
Group 1:
day city temperature windspeed event
2017-01-01 new york 32 6 Rain
2017-01-02 paris 50 13 Cloudy
2017-01-02 mumbai 85 12 Fog,
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-05 mumbai 89 7 Sunny
2017-01-05 paris 43 20 Sunny
Group 2:
day city temperature windspeed event
2017-01-04 new york 33 7 Sunny
2017-01-01 mumbai 90 5 Sunny
2017-01-03 paris 54 8 Cloudy
2017-01-07 new york 27 12 Rain
2017-01-06 mumbai 80 10 Fog
2017-01-09 paris 53 8 Sunny
Group 3:
day city temperature windspeed event
2017-01-02 new york 36 7 Sunny
2017-01-03 mumbai 87 15 Fog
2017-01-01 paris 45 20 Sunny,
2017-01-08 mumbai 89 8 Rain
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
Group 4:
day city temperature windspeed event
2017-01-03 new york 28 12 Snow,
2017-01-04 mumbai 92 5 Rain
2017-01-07 mumbai 85 9 Sunny
2017-01-04 paris 42 10 Cloudy
2017-01-08 paris 42 15 Cloudy
2017-01-08 new york 23 7 Rain
As you can see from the expected result, the main thing is that all the groups contain each topic.
What I have in mind is to group the data by city, then from each city's dataframe, divide the data into 4 groups, then for each group in the city, combine the data to get 4 final group.