I have a file containing plain text like this:
"Umbrella!!
This is a very red umbrella.
The umbrella should not be this red.
"
I am extracting all the keywords from it (after converting all words to lower case) and sorting them alphabetically, which gives me:
keywords = List(red, umbrella)
Now, I want to read the file line by line to find the line numbers which contain the words "red" and "umbrella" i.e., the words in the keywords list.
I know how to read the file line by line:
for(line <- Source.fromFile("file.txt").getLines())
But, how do I parse through each word in the line and compare it with the list element? Please help!!
EDIT:
I want output like:
red 2 3
umbrella 1 2 3
1 2 3 are the line numbers.
Just use keywords.exists(line.contains)
on each line and print index when needed
Source.fromFile("file.txt").getLines().zipWithIndex.foreach {
case(line, index) =>
if (keywords.exists(line.contains)) println(index)
}
If you want it not to be case sensetive, just do line.toLowerCase.contains
Source.fromFile("file.txt").getLines().zipWithIndex.foreach {
case(line, index) =>
if (keywords.exists(line.toLowerCase.contains)) println(index)
}
Update (to reflect changes in the answer)
To make output similar to
red 2 3
umbrella 1 2 3
Let's create a map that stores line numbers for each word.
var count = scala.collection.mutable.Map[String, List[Int]]()
keywords.foreach { k => count += k -> List[Int]()}
Source.fromFile("file.txt").getLines().zipWithIndex.foreach {
case (line, index) =>
keywords.foreach { w =>
if (line.toLowerCase.contains(w))
count(w) = count(w) :+ (index + 1)
}
}
count.keys.foreach{ i => println(i + " " + count(i) )}
To have the output exactly as you specified, replace last line by
count.keys.foreach{ i =>
print(i + " ")
count(i).foreach{ j => print(j + " ") }
println()
}
you can split each line into words, and then just check if the list contains all of he keywords. Use zipWithIndex to get the line numbers:
Source.fromFile("file.txt").getLines().zipWithIndex.filter { case(line, index) =>
val words = line.toLowerCase.split("\W")
keywords.forall(words.contains)
}
.map(_._2)
Edit: if you want separate indexes for each keywrod, you'll want to flatMap into list of (word,index) tuples first, and then group:
Source.fromFile("file.txt").getLines().zipWithIndex
.flatMap { case(line, index) =>
line.toLowerCase.split("\W").map { (_, index+1) } // "+1 because indexes are 0-based
}
.filter { keywords.contains(_._1) }
.groupBy { _._1 }.mapValues(_._2)
This gives you a Map[String,List[Int]], where keys are keywords, and values are lists of indexes of lines in which the given keyword appears.