How to subset a list based on the length of its el

2019-04-14 22:16发布

问题:

In R I have a function (coordinates from the package sp ) which looks up 11 fields of data for each IP addresss you supply.

I have a list of IP's called ip.addresses:

> head(ip.addresses)
[1] "128.177.90.11"  "71.179.12.143"  "66.31.55.111"   "98.204.243.187" "67.231.207.9"   "67.61.248.12"  

Note: Those or any other IP's can be used to reproduce this problem.

So I apply the function to that object with sapply:

ips.info     <- sapply(ip.addresses, ip2coordinates)

and get a list called ips.info as my result. This is all good and fine, but I can't do much more with a list, so I need to convert it to a dataframe. The problem is that not all IP addresses are in the databases thus some list elements only have 1 field and I get this error:

> ips.df       <- as.data.frame(ips.info)
Error in data.frame(`128.177.90.10` = list(ip.address = "128.177.90.10",  : 

arguments imply differing number of rows: 1, 0

My question is -- "How do I remove the elements with missing/incomplete data or otherwise convert this list into a data frame with 11 columns and 1 row per IP address?"

I have tried several things.

  • First, I tried to write a loop that removes elements with less than a length of 11

    for (i in 1:length(ips.info)){
    if (length(ips.info[i]) < 11){
    ips.info[i] <- NULL}}
    

This leaves some records with no data and makes others say "NULL", but even those with "NULL" are not detected by is.null

  • Next, I tried the same thing with double square brackets and get

    Error in ips.info[[i]] : subscript out of bounds
    
  • I also tried complete.cases() to see if it could potentially be useful

    Error in complete.cases(ips.info) : not all arguments have the same length
    
  • Finally, I tried a variation of my for loop which was conditioned on length(ips.info[[i]] == 11 and wrote complete records to another object, but somehow it results in an exact copy of ips.info

回答1:

Here's one way you can accomplish this using the built-in Filter function

#input data
library(RDSTK)
ip.addresses<-c("128.177.90.10","71.179.13.143","66.31.55.111","98.204.243.188",
    "67.231.207.8","67.61.248.15")
ips.info  <- sapply(ip.addresses, ip2coordinates)

#data.frame creation
lengthIs <- function(n) function(x) length(x)==n
do.call(rbind, Filter(lengthIs(11), ips.info))

or if you prefer not to use a helper function

do.call(rbind, Filter(function(x) length(x)==11, ips.info))


回答2:

Alternative solution based on base package.

  # find non-complete elements
  ids.to.remove <- sapply(ips.info, function(i) length(i) < 11)
  # remove found elements
  ips.info <- ips.info[!ids.to.remove]
  # create data.frame
  df <- do.call(rbind, ips.info)


标签: r list subset