R: If a substring of 8 characters in colA is equal

2019-08-20 13:57发布

In R, I need to compare the first 8 characters of one colA (Longitude.x) with the first 8 characters of a second colB (X.x). If the 8 characters are identical, then I want to write the value of colA (Longitude.x) to a new colC (XCoord). In other words, if colA contains a longitude value of -122.23538 and colB contains an X value of -122.235873, I want colC to take the value of colA -122.23538 because the first 8 characters (-122.235) match.

colA (Longitude.x) and colB (X.x) are both type double when first read in to R, so I have converted them to characters with the following code:

schools_merge$Longitude.x[] <- lapply(schools_merge$Longitude.x[], as.character)
schools_merge$X.x[] <- lapply(schools_merge$X.x[], as.character)

The class and type of both colA and B become "list."

I have tried the following code to write a new colC (XCoord):

schools_merge$XCoord <- if(substr(schools_merge$X.x,1,8) == substr(schools_merge$Longitude.x,1,8)) "yes" else "no"

While this code runs, it returns a warning--

Warning message:
In if (substr(schools_merge$X.x, 1, 8) == substr(schools_merge$Longitude.x,  
: the condition has length > 1 and only the first element will be used

--and not the desired outcome (for example, the second element in each list should result in a "yes" for colC (XCoord) because characters 1-8 of the number -122.23538 are equal to characters 1-8 of -122.235873).

head(schools_merge$XCoord)
head(schools_merge$Longitude.x)
head(schools_merge$X.x)

> head(schools_merge$XCoord)
[1] "no" "no" "no" "no" "no" "no"
> head(schools_merge$Longitude.x)
[[1]]
[1] "-120.76288"

[[2]]
[1] "-122.23538"

[[3]]
[1] "-122.19604"

[[4]]
[1] "-122.09222"

[[5]]
[1] "-121.77057"

[[6]]
[1] "-122.21629"

> head(schools_merge$X.x)
[[1]]
[1] "-120.763628"

[[2]]
[1] "-122.235873"

[[3]]
[1] "-122.197942"

[[4]]
[1] "-122.092998"

[[5]]
[1] "-121.770702"

[[6]]
[1] "-122.216899"

The possibilities I can think of are: 1) What I am assuming counts as a character (i.e. '-' and '.' and all numbers) is incorrect, but I have tried several different iterations of the number of characters to compare and I still get the same--either head() all "yes" or all "no," or 2) I may need to change to a convert the columns to vector instead of character. Any help is much appreciated!

Thank you, Anna

In reponse to comments below, here is a link to a subset of the data and the script: https://sfsu.box.com/s/043n3mxrj4i4mwaefykugjc16yr8mchp

1条回答
唯我独甜
2楼-- · 2019-08-20 14:11

Maybe you could try this code below:

if(substr(schools_merge$X.x,1,8) == substr(schools_merge$Longitude.x,1,8)){
schools_merge$XCoord = "yes"}else{
schools_merge$XCoord = "no"}
查看更多
登录 后发表回答