split each character in R

2019-09-25 03:26发布

问题:

I have song.txt file

*****
[1]"The snow glows white on the mountain tonight
Not a footprint to be seen."
[2]"A kingdom of isolation,
and it looks like I'm the Queen"
[3]"The wind is howling like this swirling storm inside
Couldn't keep it in;
Heaven knows I've tried"
*****
[4]"Don't let them in,
don't let them see"
[5]"Be the good girl you always have to be
Conceal, don't feel,
don't let them know"
[6]"Well now they know"
*****

I would like to loop over the lyrics and fill in the elements of each list as each element in the list contains a character vector, where each element of the vector is a word in the song.

like

[1] "The" "snow" "glows" "white" "on" "the" "mountain" "tonight" "Not" "a" "footprint"
    "to" "be" "seen." "A" "kingdom" "of" "isolation," "and" "it" "looks" "like" "I'm" "the"     
    "Queen" "The" "wind" "is" "howling" "like" "this" "swirling" "storm" "inside"
    "Couldn't" "keep" "it" "in" "Heaven" "knows" "I've" "tried"
[2]"Don't" "let" "them" "in,""don't" "let" "them" "see" "Be" "the" "good" "girl" "you"  
   "always" "have" "to" "be" "Conceal," "don't" "feel," "don't" "let" "them" "know"
   "Well" "now" "they" "know"

First I made an empty list with words <- vector("list", 2).

I think that I should first put the text into one long character vector where in relation to the delimiters ***** start and stop. with

star="\\*{5}"
pindex = grep(star, page)

After this what should I do?

回答1:

It sounds like what you want is strsplit, run (effectively) twice. So, starting from the point of "a single long character string separated by **** and spaces" (which I assume is what you have?):

list_of_vectors <- lapply(strsplit(song, split = "\\*{5}"), function(x) {

  #Split each verse by spaces
  split_verse <- strsplit(x, split = " ")

  #Then return it as a vector
  return(unlist(split_verse))

})

The result should be a list of each verse, with each element consisting of a vector of each word in that verse. Iff you're not dealing with a single character string in the read-in object, show us the file and how you're reading it in ;).



回答2:

To get it into the format you want, maybe give this a shot. Also, please update your post with more information so we can definitively solve your problem. There are a few areas of your posted question that need some clarification. Hope this helps.

## writeLines(text <- "*****
## The snow glows white on the mountain tonight
## Not a footprint to be seen.
## A kingdom of isolation,
## and it looks like I'm the Queen
## The wind is howling like this swirling storm inside
## Couldn't keep it in;
## Heaven knows I've tried
## *****
## Don't let them in,
## don't let them see
## Be the good girl you always have to be Conceal,
## don't feel,
## don't let them know
## Well now they know
## *****", "song.txt")

> read.song <- readLines("song.txt")
> split.song <- unlist(strsplit(read.song, "\\s"))
> star.index <- grep("\\*{5}", split.song)
> word.index <- sapply(2:length(star.index), function(i){
    (star.index[i-1]+1):(star.index[i]-1)
    })
> lapply(seq(word.index), function(i) split.song[ word.index[[i]] ])
## [[1]]
##  [1] "The"        "snow"       "glows"      "white"      "on"         "the"        "mountain"  
##  [8] "tonight"    "Not"        "a"          "footprint"  "to"         "be"         "seen."     
## [15] "A"          "kingdom"    "of"         "isolation," "and"        "it"         "looks"     
## [22] "like"       "I'm"        "the"        "Queen"      "The"        "wind"       "is"        
## [29] "howling"    "like"       "this"       "swirling"   "storm"      "inside"     "Couldn't"  
## [36] "keep"       "it"         "in;"        "Heaven"     "knows"      "I've"       "tried"     

## [[2]]
##  [1] "Don't"    "let"      "them"     "in,"      "don't"    "let"      "them"     "see"      "Be"      
## [10] "the"      "good"     "girl"     "you"      "always"   "have"     "to"       "be"       "Conceal,"
## [19] "don't"    "feel,"    "don't"    "let"      "them"     "know"     "Well"     "now"      "they"    
## [28] "know"  


标签: r substring