Subsetting lists via logical index vectors

2020-04-07 19:14发布

问题:

I have a complex list and need to select a subset from it, based on the value of a boolean element (I need records with hidden value equal to FALSE). I've tried the following code, based on index vectors, but it fails (as shown at the end of this output):

startups <- data$startups[data$startups$hidden == FALSE]

Or, alternatively:

startups <- data$startups[!as.logical(data$startups$hidden)]

Interactive R session proves that the data is there:

Browse[1]> str(data$startups, list.len=3)
List of 50
 $ :List of 23
  ..$ id               : num 357496
  ..$ hidden           : logi FALSE
  ..$ community_profile: logi FALSE
  .. [list output truncated]
 $ :List of 2
  ..$ id    : num 352159
  ..$ hidden: logi TRUE
 $ :List of 2
  ..$ id    : num 352157
  ..$ hidden: logi TRUE
  [list output truncated]

Browse[1]> data$startups[data$startups$hidden == FALSE]
list()

Browse[1]> data$startups[!as.logical(data$startups$hidden)]
list()

What is the problem with my code?

Update (hopefully includes reproducible example, sorry about the complex structure)

aa <- dput(head(data$startups, n=3))

produces the following output:

list(structure(list(id = 386938, hidden = FALSE, community_profile = FALSE, 
    name = "Pritunl", angellist_url = "https://angel.co/pritunl", 
    logo_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-medium_jpg.jpg?buster=1398401450", 
    thumb_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-thumb_jpg.jpg?buster=1398401450", 
    quality = 0, product_desc = "Enterprise VPN/cloud networking server", 
    high_concept = "Enterprise cloud networking", follower_count = 1, 
    company_url = "http://pritunl.com", created_at = "2014-04-25T04:50:57Z", 
    updated_at = "2014-04-25T06:02:05Z", crunchbase_url = NULL, 
    twitter_url = "http://twitter.com/pritunl", blog_url = "", 
    video_url = "", markets = list(structure(list(id = 12, tag_type = "MarketTag", 
        name = "enterprise software", display_name = "Enterprise Software", 
        angellist_url = "https://angel.co/enterprise-software"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 59, tag_type = "MarketTag", name = "open source", 
        display_name = "Open Source", angellist_url = "https://angel.co/open-source"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 123, tag_type = "MarketTag", name = "internet infrastructure", 
        display_name = "Internet Infrastructure", angellist_url = "https://angel.co/internet-infrastructure"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 306, tag_type = "MarketTag", name = "cloud management", 
        display_name = "Cloud Management", angellist_url = "https://angel.co/cloud-management"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), locations = list(
        structure(list(id = 2071, tag_type = "LocationTag", name = "new york", 
            display_name = "New York", angellist_url = "https://angel.co/new-york"), .Names = c("id", 
        "tag_type", "name", "display_name", "angellist_url"))), 
    company_size = "1-10", company_type = list(structure(list(
        id = 94212, tag_type = "CompanyTypeTag", name = "startup", 
        display_name = "Startup", angellist_url = "https://angel.co/startup"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), status = NULL, 
    screenshots = list(structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-original.png"), .Names = c("thumb", 
    "original")))), .Names = c("id", "hidden", "community_profile", 
"name", "angellist_url", "logo_url", "thumb_url", "quality", 
"product_desc", "high_concept", "follower_count", "company_url", 
"created_at", "updated_at", "crunchbase_url", "twitter_url", 
"blog_url", "video_url", "markets", "locations", "company_size", 
"company_type", "status", "screenshots")), structure(list(id = 385596, 
    hidden = FALSE, community_profile = TRUE, name = "Lariat ", 
    angellist_url = "https://angel.co/lariat-1", logo_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-medium_jpg.jpg?buster=1398260121", 
    thumb_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-thumb_jpg.jpg?buster=1398260121", 
    quality = 0, product_desc = "Thus far, the internet has gone from discovery to search discovery, and then social discovery, but with little focus on recall. Remembering your digital footprint is difficult. We aim to solve that problem. Lariat is a cloud-based recall engine to securely recall information from any page in your search history instantly through intuitive keyword search, not just from page titles, but from the contents and context of the underlying pages.\r\n\r\nWrangle in the information you want, easier and faster.", 
    high_concept = "Recall your digital footprint on the web instantly", 
    follower_count = 1, company_url = "http://www.lariattech.com", 
    created_at = "2014-04-23T13:17:47Z", updated_at = "2014-04-23T13:48:38Z", 
    crunchbase_url = NULL, twitter_url = "", blog_url = "", video_url = NULL, 
    markets = list(structure(list(id = 4, tag_type = "MarketTag", 
        name = "digital media", display_name = "Digital Media", 
        angellist_url = "https://angel.co/digital-media"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 12, tag_type = "MarketTag", name = "enterprise software", 
        display_name = "Enterprise Software", angellist_url = "https://angel.co/enterprise-software"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 59, tag_type = "MarketTag", name = "open source", 
        display_name = "Open Source", angellist_url = "https://angel.co/open-source"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 282, tag_type = "MarketTag", name = "semantic search", 
        display_name = "Semantic Search", angellist_url = "https://angel.co/semantic-search"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), locations = list(
        structure(list(id = 1620, tag_type = "LocationTag", name = "boston", 
            display_name = "Boston", angellist_url = "https://angel.co/boston"), .Names = c("id", 
        "tag_type", "name", "display_name", "angellist_url"))), 
    company_size = "1-10", company_type = structure(list(), class = "AsIs"), 
    status = NULL, screenshots = structure(list(), class = "AsIs")), .Names = c("id", 
"hidden", "community_profile", "name", "angellist_url", "logo_url", 
"thumb_url", "quality", "product_desc", "high_concept", "follower_count", 
"company_url", "created_at", "updated_at", "crunchbase_url", 
"twitter_url", "blog_url", "video_url", "markets", "locations", 
"company_size", "company_type", "status", "screenshots")), structure(list(
    id = 385595, hidden = TRUE), .Names = c("id", "hidden")))

The same in a more readable format (aa):

[[1]]
[[1]]$id
[1] 386938

[[1]]$hidden
[1] FALSE

[[1]]$community_profile
[1] FALSE

[[1]]$name
[1] "Pritunl"

[[1]]$angellist_url
[1] "https://angel.co/pritunl"

[[1]]$logo_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-medium_jpg.jpg?buster=1398401450"

[[1]]$thumb_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-thumb_jpg.jpg?buster=1398401450"

[[1]]$quality
[1] 0

[[1]]$product_desc
[1] "Enterprise VPN/cloud networking server"

[[1]]$high_concept
[1] "Enterprise cloud networking"

[[1]]$follower_count
[1] 1

[[1]]$company_url
[1] "http://pritunl.com"

[[1]]$created_at
[1] "2014-04-25T04:50:57Z"

[[1]]$updated_at
[1] "2014-04-25T06:02:05Z"

[[1]]$crunchbase_url
NULL

[[1]]$twitter_url
[1] "http://twitter.com/pritunl"

[[1]]$blog_url
[1] ""

[[1]]$video_url
[1] ""

[[1]]$markets
[[1]]$markets[[1]]
[[1]]$markets[[1]]$id
[1] 12

[[1]]$markets[[1]]$tag_type
[1] "MarketTag"

[[1]]$markets[[1]]$name
[1] "enterprise software"

[[1]]$markets[[1]]$display_name
[1] "Enterprise Software"

[[1]]$markets[[1]]$angellist_url
[1] "https://angel.co/enterprise-software"


[[1]]$markets[[2]]
[[1]]$markets[[2]]$id
[1] 59

[[1]]$markets[[2]]$tag_type
[1] "MarketTag"

[[1]]$markets[[2]]$name
[1] "open source"

[[1]]$markets[[2]]$display_name
[1] "Open Source"

[[1]]$markets[[2]]$angellist_url
[1] "https://angel.co/open-source"


[[1]]$markets[[3]]
[[1]]$markets[[3]]$id
[1] 123

[[1]]$markets[[3]]$tag_type
[1] "MarketTag"

[[1]]$markets[[3]]$name
[1] "internet infrastructure"

[[1]]$markets[[3]]$display_name
[1] "Internet Infrastructure"

[[1]]$markets[[3]]$angellist_url
[1] "https://angel.co/internet-infrastructure"


[[1]]$markets[[4]]
[[1]]$markets[[4]]$id
[1] 306

[[1]]$markets[[4]]$tag_type
[1] "MarketTag"

[[1]]$markets[[4]]$name
[1] "cloud management"

[[1]]$markets[[4]]$display_name
[1] "Cloud Management"

[[1]]$markets[[4]]$angellist_url
[1] "https://angel.co/cloud-management"



[[1]]$locations
[[1]]$locations[[1]]
[[1]]$locations[[1]]$id
[1] 2071

[[1]]$locations[[1]]$tag_type
[1] "LocationTag"

[[1]]$locations[[1]]$name
[1] "new york"

[[1]]$locations[[1]]$display_name
[1] "New York"

[[1]]$locations[[1]]$angellist_url
[1] "https://angel.co/new-york"



[[1]]$company_size
[1] "1-10"

[[1]]$company_type
[[1]]$company_type[[1]]
[[1]]$company_type[[1]]$id
[1] 94212

[[1]]$company_type[[1]]$tag_type
[1] "CompanyTypeTag"

[[1]]$company_type[[1]]$name
[1] "startup"

[[1]]$company_type[[1]]$display_name
[1] "Startup"

[[1]]$company_type[[1]]$angellist_url
[1] "https://angel.co/startup"



[[1]]$status
NULL

[[1]]$screenshots
[[1]]$screenshots[[1]]
[[1]]$screenshots[[1]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-thumb_jpg.jpg"

[[1]]$screenshots[[1]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-original.png"


[[1]]$screenshots[[2]]
[[1]]$screenshots[[2]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-thumb_jpg.jpg"

[[1]]$screenshots[[2]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-original.png"


[[1]]$screenshots[[3]]
[[1]]$screenshots[[3]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-thumb_jpg.jpg"

[[1]]$screenshots[[3]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-original.png"


[[1]]$screenshots[[4]]
[[1]]$screenshots[[4]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-thumb_jpg.jpg"

[[1]]$screenshots[[4]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-original.png"


[[1]]$screenshots[[5]]
[[1]]$screenshots[[5]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-thumb_jpg.jpg"

[[1]]$screenshots[[5]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-original.png"


[[1]]$screenshots[[6]]
[[1]]$screenshots[[6]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-thumb_jpg.jpg"

[[1]]$screenshots[[6]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-original.png"


[[1]]$screenshots[[7]]
[[1]]$screenshots[[7]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-thumb_jpg.jpg"

[[1]]$screenshots[[7]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-original.png"




[[2]]
[[2]]$id
[1] 385596

[[2]]$hidden
[1] FALSE

[[2]]$community_profile
[1] TRUE

[[2]]$name
[1] "Lariat "

[[2]]$angellist_url
[1] "https://angel.co/lariat-1"

[[2]]$logo_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-medium_jpg.jpg?buster=1398260121"

[[2]]$thumb_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-thumb_jpg.jpg?buster=1398260121"

[[2]]$quality
[1] 0

[[2]]$product_desc
[1] "Thus far, the internet has gone from discovery to search discovery, and then social discovery, but with little focus on recall. Remembering your digital footprint is difficult. We aim to solve that problem. Lariat is a cloud-based recall engine to securely recall information from any page in your search history instantly through intuitive keyword search, not just from page titles, but from the contents and context of the underlying pages.\r\n\r\nWrangle in the information you want, easier and faster."

[[2]]$high_concept
[1] "Recall your digital footprint on the web instantly"

[[2]]$follower_count
[1] 1

[[2]]$company_url
[1] "http://www.lariattech.com"

[[2]]$created_at
[1] "2014-04-23T13:17:47Z"

[[2]]$updated_at
[1] "2014-04-23T13:48:38Z"

[[2]]$crunchbase_url
NULL

[[2]]$twitter_url
[1] ""

[[2]]$blog_url
[1] ""

[[2]]$video_url
NULL

[[2]]$markets
[[2]]$markets[[1]]
[[2]]$markets[[1]]$id
[1] 4

[[2]]$markets[[1]]$tag_type
[1] "MarketTag"

[[2]]$markets[[1]]$name
[1] "digital media"

[[2]]$markets[[1]]$display_name
[1] "Digital Media"

[[2]]$markets[[1]]$angellist_url
[1] "https://angel.co/digital-media"


[[2]]$markets[[2]]
[[2]]$markets[[2]]$id
[1] 12

[[2]]$markets[[2]]$tag_type
[1] "MarketTag"

[[2]]$markets[[2]]$name
[1] "enterprise software"

[[2]]$markets[[2]]$display_name
[1] "Enterprise Software"

[[2]]$markets[[2]]$angellist_url
[1] "https://angel.co/enterprise-software"


[[2]]$markets[[3]]
[[2]]$markets[[3]]$id
[1] 59

[[2]]$markets[[3]]$tag_type
[1] "MarketTag"

[[2]]$markets[[3]]$name
[1] "open source"

[[2]]$markets[[3]]$display_name
[1] "Open Source"

[[2]]$markets[[3]]$angellist_url
[1] "https://angel.co/open-source"


[[2]]$markets[[4]]
[[2]]$markets[[4]]$id
[1] 282

[[2]]$markets[[4]]$tag_type
[1] "MarketTag"

[[2]]$markets[[4]]$name
[1] "semantic search"

[[2]]$markets[[4]]$display_name
[1] "Semantic Search"

[[2]]$markets[[4]]$angellist_url
[1] "https://angel.co/semantic-search"



[[2]]$locations
[[2]]$locations[[1]]
[[2]]$locations[[1]]$id
[1] 1620

[[2]]$locations[[1]]$tag_type
[1] "LocationTag"

[[2]]$locations[[1]]$name
[1] "boston"

[[2]]$locations[[1]]$display_name
[1] "Boston"

[[2]]$locations[[1]]$angellist_url
[1] "https://angel.co/boston"



[[2]]$company_size
[1] "1-10"

[[2]]$company_type
list()

[[2]]$status
NULL

[[2]]$screenshots
list()


[[3]]
[[3]]$id
[1] 385595

[[3]]$hidden
[1] TRUE

Finally, applying subsetting operation via logical index vector:

aa[data$startups$hidden == FALSE]

the result is an empty list (despite hidden = FALSE for the 1st and 2nd elements):

list()

Again, sorry about the output's size, but I had to retain the structure of the list.

Considerations:

According to R Project's "Introduction to R" (http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors),

"Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets. More generally any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression".

A the same time, according to Hadley Wickham's "Advanced R" (http://adv-r.had.co.nz/Subsetting.html),

"subsetting a list works in exactly the same way as subsetting an atomic vector".

回答1:

The example data in the question is a list of length 3 which we shall call L. Each of its components is itself a list and one component of each of these sublists is hidden. We can extract the hidden components of the sublists into a logical vector called hidden. Using that logical vector we can subset the original list L giving a new list containing only those sublists with a hidden component of TRUE.

hidden <- sapply(L, "[[", "hidden") # create logical vector hidden
L[hidden]

For the data provided we get a list with one component:

> length(L[hidden])
[1] 1

and if we knew that there were only one component then L[hidden][[1]] or L[[which(hidden)]] would give that single component.



回答2:

Data frames are indexed using two numbers. To only select rows, you need to do:

data$startups[data$startups$hidden == FALSE, ]