I'm trying to filter a text that has new lines
in open refine.
The input is:
Them Spanish girls love me like I'm Aventura
I'm the man, y'all don't get it, do ya?
Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura
Tell Uncle Luke I'm out in Miami, too
Them Spanish girls love me like I'm Aventura
The expected Result would be:
Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura
I'm trying to get the line with the keyword and the lines before and after.
My code to do it with standard regex looks like that:
/((.*\n){2})^.*\b(New York)\b.*((.*\n){3})/m
But that doesn't work in open refine.
I tried the following, but it only returns 'null'
value.match(/.*(\New York)/.*)
Any one has an idea how I could do it?
I really need to keep the lines, so I cant do a
replace(/\n/,'')
before the match.
The brand new OpenRefine 3 has a find()
function much more user friendly than match()
.
I think this regex should do the trick :
value.find(/(.*\n){1}.+New York.+(\n.*){1}/).join('\n')
Result:
If for some reason you prefer to stay in OpenRefine 2.8, Python/Jython offers an alternative:
import re
matches = re.findall(r".+?\n.+New York.+\n.+", value)
return "\n".join(matches)
Result:
If you feel like completely avoiding RegEx and simply read the text and write the line before and the line after this is something you can get, if you write the text in Cell A1
in Excel:
Public Sub TestMe()
Dim inputString As String
inputString = Range("A1")
Dim lookForWord As String
lookForWord = "New York"
Dim inputArr As Variant
inputArr = Split(inputString, vbLf)
Dim line As Variant
Dim previousLine As String
Dim foundWord As Boolean
Dim linesAfter As Long: linesAfter = 1
For Each line In inputArr
If InStr(1, line, lookForWord) Then
previousLine = previousLine & vbCrLf & line
foundWord = True
Else
If foundWord And linesAfter Then
previousLine = previousLine & vbCrLf & line
linesAfter = linesAfter - 1
ElseIf linesAfter Then
previousLine = line
End If
End If
Next line
If Not linesAfter Then Debug.Print previousLine
End Sub
The Split()
parses the text to an array like this:
the linesAfter
variable can tell you how many lines after the word should be displayed.