How can I create a new dataframe comparing values

2019-08-02 15:25发布

问题:

I have a data frame that has the data from the Gini Index of countries. Plenty of the values are NA, so i want to create a new data frame that has, for each country, the most recent Gini Index measured for it. For example, if Brazil has a value for 2012, 2013 and 2015, the new data frame will have only the value of 2015. This is how the data looks like:

              Country.Name Country.Code X2014 X2015 X2016 X2017
8                Argentina          ARG  41.4    NA  42.4    NA
9                  Armenia          ARM  31.5  32.4  32.5    NA
13                 Austria          AUT  30.5  30.5    NA    NA
16                 Belgium          BEL  28.1  27.7    NA    NA
17                   Benin          BEN    NA  47.8    NA    NA
18            Burkina Faso          BFA  35.3    NA    NA    NA
19              Bangladesh          BGD    NA    NA  32.4    NA
20                Bulgaria          BGR  37.4    NA    NA    NA
23  Bosnia and Herzegovina          BIH    NA  32.7    NA    NA
24                 Belarus          BLR  27.2  26.7  27.0    NA
27                 Bolivia          BOL  47.8  46.7  44.6    NA
28                  Brazil          BRA  51.5  51.3    NA    NA
31                  Bhutan          BTN    NA    NA    NA  37.4
36             Switzerland          CHE  32.5  32.3    NA    NA
38                   Chile          CHL    NA  47.7    NA    NA
40           Cote d'Ivoire          CIV    NA  41.5    NA    NA
41                Cameroon          CMR  46.6    NA    NA    NA
44                Colombia          COL  52.8  51.1  50.8    NA
47              Costa Rica          CRI  48.6  48.4  48.7    NA
52                  Cyprus          CYP  35.6  34.0    NA    NA
53          Czech Republic          CZE  25.9  25.9    NA    NA
54                 Germany          DEU    NA  31.7    NA    NA
57                 Denmark          DNK  28.4  28.2    NA    NA
58      Dominican Republic          DOM  44.1  44.7  45.3    NA
65                 Ecuador          ECU  45.0  46.0  45.0    NA
66        Egypt, Arab Rep.          EGY    NA  31.8    NA    NA
69                   Spain          ESP  36.1  36.2    NA    NA
70                 Estonia          EST  34.6  32.7    NA    NA
71                Ethiopia          ETH    NA  39.1    NA    NA
74                 Finland          FIN  26.8  27.1    NA    NA
76                  France          FRA  32.3  32.7    NA    NA
79                   Gabon          GAB    NA    NA    NA  38.0
80          United Kingdom          GBR  34.0  33.2    NA    NA
81                 Georgia          GEO  37.3  36.4  36.5    NA
85             Gambia, The          GMB    NA  35.9    NA    NA
88                  Greece          GRC  35.8  36.0    NA    NA
91               Guatemala          GTM  48.3    NA    NA    NA
96                Honduras          HND  50.4  49.6  50.0    NA
98                 Croatia          HRV  32.1  31.1    NA    NA
100                Hungary          HUN  30.9  30.4    NA    NA
110                Ireland          IRL  31.9  31.8    NA    NA
111     Iran, Islamic Rep.          IRN  38.8    NA    NA    NA
113                Iceland          ISL  27.8    NA    NA    NA
115                  Italy          ITA  34.7  35.4    NA    NA
119             Kazakhstan          KAZ  27.0  26.9    NA    NA
120                  Kenya          KEN    NA  40.8    NA    NA
121        Kyrgyz Republic          KGZ  26.8  29.0  26.8    NA
130                Liberia          LBR  33.2    NA    NA    NA
137              Sri Lanka          LKA    NA    NA  39.8    NA
142              Lithuania          LTU  37.7  37.4    NA    NA
143             Luxembourg          LUX  31.2  33.8    NA    NA
144                 Latvia          LVA  35.1  34.2    NA    NA
149                Moldova          MDA  26.8  27.0  26.3    NA
153                 Mexico          MEX  45.8    NA  43.4    NA
156         Macedonia, FYR          MKD  35.6    NA    NA    NA
158                  Malta          MLT  29.0  29.4    NA    NA
159                Myanmar          MMR    NA  38.1    NA    NA
161             Montenegro          MNE  31.9    NA    NA    NA
162               Mongolia          MNG  32.0    NA  32.3    NA
164             Mozambique          MOZ  54.0    NA    NA    NA
165             Mauritania          MRT  32.6    NA    NA    NA
168               Malaysia          MYS    NA  41.0    NA    NA
170                Namibia          NAM    NA  59.1    NA    NA
172                  Niger          NER  34.3    NA    NA    NA
174              Nicaragua          NIC  46.2    NA    NA    NA
175            Netherlands          NLD  28.6  28.2    NA    NA
176                 Norway          NOR  26.8  27.5    NA    NA
183               Pakistan          PAK    NA  33.5    NA    NA
184                 Panama          PAN  50.6  50.8  50.4    NA
185                   Peru          PER  43.4  43.5  43.8    NA
193               Portugal          PRT  35.6  35.5    NA    NA
194               Paraguay          PRY  50.7  47.6  47.9    NA
195     West Bank and Gaza          PSE    NA    NA  33.7    NA
200                Romania          ROU  36.0  35.9    NA    NA
201     Russian Federation          RUS  39.9  37.7    NA    NA
210            El Salvador          SLV  41.6  40.6  40.0    NA
220        Slovak Republic          SVK  26.1  26.5    NA    NA
221               Slovenia          SVN  25.7  25.4    NA    NA
222                 Sweden          SWE  28.4  29.2    NA    NA
231                   Togo          TGO    NA  43.1    NA    NA
232               Thailand          THA  37.0  36.0    NA    NA
233             Tajikistan          TJK    NA  34.0    NA    NA
236            Timor-Leste          TLS  28.7    NA    NA    NA
243                 Turkey          TUR  41.2  42.9  41.9    NA
246                 Uganda          UGA    NA    NA  42.8    NA
247                Ukraine          UKR  24.0  25.5  25.0    NA
249                Uruguay          URY  40.1  40.2  39.7    NA
250          United States          USA    NA    NA  41.5    NA
256                Vietnam          VNM  34.8    NA  35.3    NA
260                 Kosovo          XKX  27.3  26.4  26.5    NA
261            Yemen, Rep.          YEM  36.7    NA    NA    NA
262           South Africa          ZAF  63.0    NA    NA    NA
263                 Zambia          ZMB    NA  57.1    NA    NA

This is already a subset I made since I considered values older than 2014 not useful. I want to get the most recent value for each country in order to make an inequality rank. Any thoughts?

回答1:

You could use coalesce :

library(tidyverse)
df %>% mutate(last = invoke(coalesce,df[6:3])) %>% head
# edit, more simply : 
# df %>% mutate(last = coalesce(!!!df[6:3])) %>% head
#             Country.Name Country.Code X2014 X2015 X2016 X2017 last
# 1              Argentina          ARG  41.4    NA  42.4    NA 42.4
# 2                Armenia          ARM  31.5  32.4  32.5    NA 32.5
# 3                Austria          AUT  30.5  30.5    NA    NA 30.5
# 4                Belgium          BEL  28.1  27.7    NA    NA 27.7
# 5                  Benin          BEN    NA  47.8    NA    NA 47.8
# 6           Burkina Faso          BFA  35.3    NA    NA    NA 35.3

In base R (less efficient, same output) :

df$last <- apply(df[6:3],1,function(x) na.omit(x)[1])


回答2:

A tidyverse option

library(tidyverse)
df %>%
    gather(Year, Index, starts_with("X")) %>%
    mutate(Year = as.numeric(str_replace(Year, "X", ""))) %>%
    group_by(Country.Code) %>%
    arrange(Country.Code, desc(Year)) %>%
    filter(!is.na(Index)) %>%
    slice(1)
    ungroup()
## A tibble: 93 x 4
#   Country.Name           Country.Code  Year Index
#   <fct>                  <fct>        <dbl> <dbl>
# 1 Argentina              ARG           2016  42.4
# 2 Armenia                ARM           2016  32.5
# 3 Austria                AUT           2015  30.5
# 4 Belgium                BEL           2015  27.7
# 5 Benin                  BEN           2015  47.8
# 6 Burkina Faso           BFA           2014  35.3
# 7 Bangladesh             BGD           2016  32.4
# 8 Bulgaria               BGR           2014  37.4
# 9 Bosnia and Herzegovina BIH           2015  32.7
#10 Belarus                BLR           2016  27

Explanation: Reshape yearly Gini data from wide to long, group by Country.Code, sort entries by descending Year, remove NA rows and keep only the latest entry per Country.Code.


Sample data

df <- read.table(text =
    "Country.Name Country.Code X2014 X2015 X2016 X2017
8                Argentina          ARG  41.4    NA  42.4    NA
9                  Armenia          ARM  31.5  32.4  32.5    NA
13                 Austria          AUT  30.5  30.5    NA    NA
16                 Belgium          BEL  28.1  27.7    NA    NA
17                   Benin          BEN    NA  47.8    NA    NA
18            'Burkina Faso'          BFA  35.3    NA    NA    NA
19              Bangladesh          BGD    NA    NA  32.4    NA
20                Bulgaria          BGR  37.4    NA    NA    NA
23  'Bosnia and Herzegovina'          BIH    NA  32.7    NA    NA
24                 Belarus          BLR  27.2  26.7  27.0    NA
27                 Bolivia          BOL  47.8  46.7  44.6    NA
28                  Brazil          BRA  51.5  51.3    NA    NA
31                  Bhutan          BTN    NA    NA    NA  37.4
36             Switzerland          CHE  32.5  32.3    NA    NA
38                   Chile          CHL    NA  47.7    NA    NA
40           'Cote d Ivoire'          CIV    NA  41.5    NA    NA
41                Cameroon          CMR  46.6    NA    NA    NA
44                Colombia          COL  52.8  51.1  50.8    NA
47              'Costa Rica'          CRI  48.6  48.4  48.7    NA
52                  Cyprus          CYP  35.6  34.0    NA    NA
53          'Czech Republic'          CZE  25.9  25.9    NA    NA
54                 Germany          DEU    NA  31.7    NA    NA
57                 Denmark          DNK  28.4  28.2    NA    NA
58      'Dominican Republic'          DOM  44.1  44.7  45.3    NA
65                 Ecuador          ECU  45.0  46.0  45.0    NA
66        'Egypt, Arab Rep.'          EGY    NA  31.8    NA    NA
69                   Spain          ESP  36.1  36.2    NA    NA
70                 Estonia          EST  34.6  32.7    NA    NA
71                Ethiopia          ETH    NA  39.1    NA    NA
74                 Finland          FIN  26.8  27.1    NA    NA
76                  France          FRA  32.3  32.7    NA    NA
79                   Gabon          GAB    NA    NA    NA  38.0
80          'United Kingdom'          GBR  34.0  33.2    NA    NA
81                 Georgia          GEO  37.3  36.4  36.5    NA
85             'Gambia, The'          GMB    NA  35.9    NA    NA
88                  Greece          GRC  35.8  36.0    NA    NA
91               Guatemala          GTM  48.3    NA    NA    NA
96                Honduras          HND  50.4  49.6  50.0    NA
98                 Croatia          HRV  32.1  31.1    NA    NA
100                Hungary          HUN  30.9  30.4    NA    NA
110                Ireland          IRL  31.9  31.8    NA    NA
111     'Iran, Islamic Rep.'          IRN  38.8    NA    NA    NA
113                Iceland          ISL  27.8    NA    NA    NA
115                  Italy          ITA  34.7  35.4    NA    NA
119             Kazakhstan          KAZ  27.0  26.9    NA    NA
120                  Kenya          KEN    NA  40.8    NA    NA
121        'Kyrgyz Republic'          KGZ  26.8  29.0  26.8    NA
130                Liberia          LBR  33.2    NA    NA    NA
137              'Sri Lanka'          LKA    NA    NA  39.8    NA
142              Lithuania          LTU  37.7  37.4    NA    NA
143             Luxembourg          LUX  31.2  33.8    NA    NA
144                 Latvia          LVA  35.1  34.2    NA    NA
149                Moldova          MDA  26.8  27.0  26.3    NA
153                 Mexico          MEX  45.8    NA  43.4    NA
156         'Macedonia, FYR'          MKD  35.6    NA    NA    NA
158                  Malta          MLT  29.0  29.4    NA    NA
159                Myanmar          MMR    NA  38.1    NA    NA
161             Montenegro          MNE  31.9    NA    NA    NA
162               Mongolia          MNG  32.0    NA  32.3    NA
164             Mozambique          MOZ  54.0    NA    NA    NA
165             Mauritania          MRT  32.6    NA    NA    NA
168               Malaysia          MYS    NA  41.0    NA    NA
170                Namibia          NAM    NA  59.1    NA    NA
172                  Niger          NER  34.3    NA    NA    NA
174              Nicaragua          NIC  46.2    NA    NA    NA
175            Netherlands          NLD  28.6  28.2    NA    NA
176                 Norway          NOR  26.8  27.5    NA    NA
183               Pakistan          PAK    NA  33.5    NA    NA
184                 Panama          PAN  50.6  50.8  50.4    NA
185                   Peru          PER  43.4  43.5  43.8    NA
193               Portugal          PRT  35.6  35.5    NA    NA
194               Paraguay          PRY  50.7  47.6  47.9    NA
195     'West Bank and Gaza'          PSE    NA    NA  33.7    NA
200                Romania          ROU  36.0  35.9    NA    NA
201     'Russian Federation'          RUS  39.9  37.7    NA    NA
210            'El Salvador'          SLV  41.6  40.6  40.0    NA
220        'Slovak Republic'          SVK  26.1  26.5    NA    NA
221               Slovenia          SVN  25.7  25.4    NA    NA
222                 Sweden          SWE  28.4  29.2    NA    NA
231                   Togo          TGO    NA  43.1    NA    NA
232               Thailand          THA  37.0  36.0    NA    NA
233             Tajikistan          TJK    NA  34.0    NA    NA
236            Timor-Leste          TLS  28.7    NA    NA    NA
243                 Turkey          TUR  41.2  42.9  41.9    NA
246                 Uganda          UGA    NA    NA  42.8    NA
247                Ukraine          UKR  24.0  25.5  25.0    NA
249                Uruguay          URY  40.1  40.2  39.7    NA
250          'United States'          USA    NA    NA  41.5    NA
256                Vietnam          VNM  34.8    NA  35.3    NA
260                 Kosovo          XKX  27.3  26.4  26.5    NA
261            'Yemen, Rep.'          YEM  36.7    NA    NA    NA
262           'South Africa'          ZAF  63.0    NA    NA    NA
263                 Zambia          ZMB    NA  57.1    NA    NA", header = T)


回答3:

Another option with dplyr and tidyr is the following. I used df by Maurits Evers. You reformat your data from a wide format to a long format with gather(). Then, you define a group variable with Country.Name. For each country, you get indices for non-NA values and choose the max index number. You use it to subset your data with slice().

gather(df, key = year, value = value, -Country.Name, -Country.Code) %>%
group_by(Country.Name) %>%
slice(max(which(!is.na(value))))

   Country.Name           Country.Code year  value
   <fct>                  <fct>        <chr> <dbl>
 1 Argentina              ARG          X2016  42.4
 2 Armenia                ARM          X2016  32.5
 3 Austria                AUT          X2015  30.5
 4 Bangladesh             BGD          X2016  32.4
 5 Belarus                BLR          X2016  27  
 6 Belgium                BEL          X2015  27.7
 7 Benin                  BEN          X2015  47.8
 8 Bhutan                 BTN          X2017  37.4
 9 Bolivia                BOL          X2016  44.6
10 Bosnia and Herzegovina BIH          X2015  32.7
 ... with 83 more rows


回答4:

Since you only care about those four years, a simple way to do it could be to check for an NA value for each year looking backwards and map to a separate column

df$mostRecent = NA

#Moving backwards, if most recent value is NA then check the previous year

df$mostRecent[is.na(df$mostRecent)] <- df$X2017[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2016[is.na(df$mostRecent)]
df$mostRecent[is.na(df$mostRecent)] <- df$X2015[is.na(df$mostRecent)] 
df$mostRecent[is.na(df$mostRecent)] <- df$X2014[is.na(df$mostRecent)]