Find string in Substring Regex

2019-09-21 20:46发布

问题:

I have a column in a DataFrame scala, that contain many string like this:

[Z12M1E][AGORA][Essai][CS_RES]ECO-56417-Escalade-ECO-56344-#incidentMajProduit#  Y2103      Y2103B0    S82NE      INCIDENTE             20180305   030403 20180305   030512

[Z12M1E][AGORA][Essai]ECO-56417-Escalade-ECO-56344-#incidentMajProduit#  Y2103      Y2103B0    S82NE      INCIDENTE   [CS_RES]       20180305   030403 20180305   030512


[Z12M1E][AGORA][20180305]ECO-56417--ECO-56344-#incidentMajProduit#  Y2103      Y2103B0    S82NE      INCIDENTE       Escalade 20180305   030512

[Z12M1E][AGORA][20180305]ECO-56417--ECO-56344-#incidentMajProduit#  Y2103      Y2103B0    S82NE      INCIDENTE   [CS_RES]          Escalade 20180305   030512

I would like compute the number of line that contain the string [CS_RES], for exemple in my dataframe, the number of the line contain the string [CS_RES] is 3.

How can I do it using Regex ?

回答1:

Try this:

val str = "your input string"

val reg = ".*\\[CS_RES\\].*".r
reg.findAllIn(str).length 

Note that the escape character \ also needs to be escaped.



回答2:

May be this is what you are looking for the line numbers containing the String, [CS_RES] Let us put your dataframe data in a text file, datafile.txt in the current directory of scala. Then,

val lines = io.Source.fromFile("datafile.txt").getLines.toArray

will read all lines into lines array of strings, Array[String]. Now the following command will process lines containing the desired string and returns a list of line numbers containing [CS_RES]. I checked this command placing the sample data you provided in the question and it is giving me a List[Int] with line numbers 1,3 and 8.

scala> lines.map(x=>if(x.matches(""".*\[CS_RES\].*"""))
             (lines.indexOf(x)+1)else 0).toList.filter(_!=0)
res50: List[Int] = List(1, 3, 8)


标签: regex scala