How to compare each word of a line in a file with

2019-08-07 05:18发布

问题:

I have a file containing plain text like this: "Umbrella!! This is a very red umbrella. The umbrella should not be this red."

I am extracting all the keywords from it (after converting all words to lower case) and sorting them alphabetically, which gives me:

keywords = List(red, umbrella)

Now, I want to read the file line by line to find the line numbers which contain the words "red" and "umbrella" i.e., the words in the keywords list.

I know how to read the file line by line:

for(line <- Source.fromFile("file.txt").getLines())

But, how do I parse through each word in the line and compare it with the list element? Please help!!

EDIT:

I want output like:

red 2 3
umbrella 1 2 3

1 2 3 are the line numbers.

回答1:

Just use keywords.exists(line.contains) on each line and print index when needed

Source.fromFile("file.txt").getLines().zipWithIndex.foreach { 
    case(line, index) => 
        if (keywords.exists(line.contains)) println(index)
}

If you want it not to be case sensetive, just do line.toLowerCase.contains

Source.fromFile("file.txt").getLines().zipWithIndex.foreach { 
    case(line, index) => 
        if (keywords.exists(line.toLowerCase.contains)) println(index)
}

Update (to reflect changes in the answer)

To make output similar to

red 2 3
umbrella 1 2 3

Let's create a map that stores line numbers for each word.

var count = scala.collection.mutable.Map[String, List[Int]]()
keywords.foreach { k => count += k -> List[Int]()}
Source.fromFile("file.txt").getLines().zipWithIndex.foreach {
  case (line, index) =>
    keywords.foreach { w =>
      if (line.toLowerCase.contains(w)) 
        count(w) = count(w) :+ (index + 1)
    }
}
count.keys.foreach{ i => println(i + " " + count(i) )}

To have the output exactly as you specified, replace last line by

  count.keys.foreach{ i =>  
                   print(i + " ") 
                      count(i).foreach{ j => print(j + " ") }
                      println()
                   }


回答2:

you can split each line into words, and then just check if the list contains all of he keywords. Use zipWithIndex to get the line numbers:

Source.fromFile("file.txt").getLines().zipWithIndex.filter { case(line, index) => 
    val words = line.toLowerCase.split("\W")
    keywords.forall(words.contains)
}
.map(_._2)

Edit: if you want separate indexes for each keywrod, you'll want to flatMap into list of (word,index) tuples first, and then group:

Source.fromFile("file.txt").getLines().zipWithIndex
   .flatMap { case(line, index) => 
       line.toLowerCase.split("\W").map { (_, index+1) }  // "+1 because indexes are 0-based
   }
   .filter  { keywords.contains(_._1) }
   .groupBy { _._1 }.mapValues(_._2)

This gives you a Map[String,List[Int]], where keys are keywords, and values are lists of indexes of lines in which the given keyword appears.



标签: scala