Apply several string transformations in scala

2019-08-06 19:16发布

问题:

I want to perform several ordered and successive replaceAll(...,...) on a string in a functional way in scala.

What's the most elegant solution ? Scalaz welcome ! ;)

回答1:

First, let's get a function out of the replaceAll method:

scala> val replace = (from: String, to: String) => (_:String).replaceAll(from, to)
replace: (String, String) => String => java.lang.String = <function2>

Now you can use Functor instance for function, defined in scalaz. That way you can compose functions, using map (or to make it look better, using unicode aliases).

It will look like this:

scala> replace("from", "to") ∘ replace("to", "from") ∘ replace("some", "none")
res0: String => java.lang.String = <function1>

If you prefer haskell-way compose (right to left), use contramap:

scala> replace("some", "none") ∙ replace("to", "from") ∙ replace ("from", "to")
res2: String => java.lang.String = <function1>

You can also have some fun with Category instance:

scala> replace("from", "to") ⋙ replace("to", "from") ⋙ replace("some", "none")
res5: String => java.lang.String = <function1>

scala> replace("some", "none") ⋘ replace("to", "from") ⋘ replace ("from", "to")
res7: String => java.lang.String = <function1>

And applying it:

scala> "somestringfromto" |> res0
res3: java.lang.String = nonestringfromfrom

scala> res2("somestringfromto")
res4: java.lang.String = nonestringfromfrom

scala> "somestringfromto" |> res5
res6: java.lang.String = nonestringfromfrom

scala> res7("somestringfromto")
res8: java.lang.String = nonestringfromfrom


回答2:

If its just a few invocations then just chain them. Otherwise I guess I'd try this:

Seq("a" -> "b", "b" -> "a").foldLeft("abab"){case (z, (s,r)) => z.replaceAll(s, r)}

Or if you like shorter code with confusing wildcards and extra closures:

Seq("a" -> "b", "b" -> "a").foldLeft("abab"){_.replaceAll _ tupled(_)}


回答3:

Another Scalaz-based solution to this problem would be to use the Endo monoid. This monoid captures the identity function (as the monoid's identity element) and function composition (as the monoid's append operation). This solution would be particularly useful if you have an arbitrarily-sized (even possibly empty) list of functions to apply.

val replace = (from: String, to: String) => (_:String).replaceAll(from, to)

val f: Endo[String] = List(
  replace("some", "none"),
  replace("to", "from"),
  replace("from", "to")    
).foldMap(_.endo)

e.g. (using one of folone's examples)

scala> f.run("somestringfromto")
res0: String = nonestringfromfrom


回答4:

Define a replace function with anonymous parameters and then you can chain successive replace functions together.

scala> val s = "hello world"
res0: java.lang.String = hello world

scala> def replace = s.replaceAll(_, _)
replace: (java.lang.String, java.lang.String) => java.lang.String

scala> replace("h", "H")  replace("w", "W")
res1: java.lang.String = Hello World


回答5:

#to replace or remove multiple substrings in scala in dataframe's string column

import play.api.libs.json._
#to find
def isContainingContent(str:String,regexStr:String):Boolean={
  val regex=new scala.util.matching.Regex(regexStr)
  val containingRemovables= regex.findFirstIn(str)
  containingRemovables match{
    case Some(s) => true
    case None => false
  }
}
val colContentPresent= udf((str: String,regex:String) => {
  isContainingContent(str,regex)
})
#to remove
val cleanPayloadOfRemovableContent= udf((str: String,regexStr:String) => {
  val regex=new scala.util.matching.Regex(regexStr)
  val cleanedStr= regex.replaceAllIn(str,"")
  cleanedStr
})
#to define
val removableContentRegex=
"<log:Logs>[\\s\\S]*?</log:Logs>|\\\\n<![\\s\\S]*?-->|<\\?xml[\\s\\S]*?\\?>"

#to call
val dfPayloadLogPresent = dfXMLCheck.withColumn("logsPresentInit", colContentPresent($"payload",lit(removableContentRegex)))
val dfCleanedXML = dfPayloadLogPresent.withColumn("payload", cleanPayloadOfRemovableContent($"payload",lit(removableContentRegex)))