I want to perform several ordered and successive replaceAll(...,...) on a string in a functional way in scala.
What's the most elegant solution ? Scalaz welcome ! ;)
I want to perform several ordered and successive replaceAll(...,...) on a string in a functional way in scala.
What's the most elegant solution ? Scalaz welcome ! ;)
First, let's get a function out of the replaceAll
method:
scala> val replace = (from: String, to: String) => (_:String).replaceAll(from, to)
replace: (String, String) => String => java.lang.String = <function2>
Now you can use Functor
instance for function, defined in scalaz. That way you can compose functions, using map
(or to make it look better, using unicode aliases).
It will look like this:
scala> replace("from", "to") ∘ replace("to", "from") ∘ replace("some", "none")
res0: String => java.lang.String = <function1>
If you prefer haskell-way compose (right to left), use contramap
:
scala> replace("some", "none") ∙ replace("to", "from") ∙ replace ("from", "to")
res2: String => java.lang.String = <function1>
You can also have some fun with Category
instance:
scala> replace("from", "to") ⋙ replace("to", "from") ⋙ replace("some", "none")
res5: String => java.lang.String = <function1>
scala> replace("some", "none") ⋘ replace("to", "from") ⋘ replace ("from", "to")
res7: String => java.lang.String = <function1>
And applying it:
scala> "somestringfromto" |> res0
res3: java.lang.String = nonestringfromfrom
scala> res2("somestringfromto")
res4: java.lang.String = nonestringfromfrom
scala> "somestringfromto" |> res5
res6: java.lang.String = nonestringfromfrom
scala> res7("somestringfromto")
res8: java.lang.String = nonestringfromfrom
If its just a few invocations then just chain them. Otherwise I guess I'd try this:
Seq("a" -> "b", "b" -> "a").foldLeft("abab"){case (z, (s,r)) => z.replaceAll(s, r)}
Or if you like shorter code with confusing wildcards and extra closures:
Seq("a" -> "b", "b" -> "a").foldLeft("abab"){_.replaceAll _ tupled(_)}
Another Scalaz-based solution to this problem would be to use the Endo
monoid. This monoid captures the identity function (as the monoid's identity element) and function composition (as the monoid's append operation). This solution would be particularly useful if you have an arbitrarily-sized (even possibly empty) list of functions to apply.
val replace = (from: String, to: String) => (_:String).replaceAll(from, to)
val f: Endo[String] = List(
replace("some", "none"),
replace("to", "from"),
replace("from", "to")
).foldMap(_.endo)
e.g. (using one of folone's examples)
scala> f.run("somestringfromto")
res0: String = nonestringfromfrom
Define a replace function with anonymous parameters and then you can chain successive replace functions together.
scala> val s = "hello world"
res0: java.lang.String = hello world
scala> def replace = s.replaceAll(_, _)
replace: (java.lang.String, java.lang.String) => java.lang.String
scala> replace("h", "H") replace("w", "W")
res1: java.lang.String = Hello World
#to replace or remove multiple substrings in scala in dataframe's string column
import play.api.libs.json._
#to find
def isContainingContent(str:String,regexStr:String):Boolean={
val regex=new scala.util.matching.Regex(regexStr)
val containingRemovables= regex.findFirstIn(str)
containingRemovables match{
case Some(s) => true
case None => false
}
}
val colContentPresent= udf((str: String,regex:String) => {
isContainingContent(str,regex)
})
#to remove
val cleanPayloadOfRemovableContent= udf((str: String,regexStr:String) => {
val regex=new scala.util.matching.Regex(regexStr)
val cleanedStr= regex.replaceAllIn(str,"")
cleanedStr
})
#to define
val removableContentRegex=
"<log:Logs>[\\s\\S]*?</log:Logs>|\\\\n<![\\s\\S]*?-->|<\\?xml[\\s\\S]*?\\?>"
#to call
val dfPayloadLogPresent = dfXMLCheck.withColumn("logsPresentInit", colContentPresent($"payload",lit(removableContentRegex)))
val dfCleanedXML = dfPayloadLogPresent.withColumn("payload", cleanPayloadOfRemovableContent($"payload",lit(removableContentRegex)))