Add Regex Matches as new columns to the csv file [

2019-09-09 21:57发布

I have a .csv file that I need to add regex matches in each line as new columns after the original columns, here is a part of the .csv file:

"Event";"User";"Description"   
"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)"
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)"

Here is the two Regex Patterns I want to add their extracted results from each row as new columns (one column for each)

(?<=Product\s)\w.*?(?=\s*\(Shop)

(?<=Shop ID:\s)\w.*?(?=\))

The Result on the data should be Like this (Header Row is not important):

"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"  
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

Sorry I'm very basic in Batch Scripting, thanks in advance

3条回答
在下西门庆
2楼-- · 2019-09-09 22:26

Windows batch does not have a native regex find/replace utility. The only regex utility is FINDSTR, and that is extremely limited and non-standard, and it can only print out entire lines that match the search - it cannot print out just the matching portion.

You could use PowerShell.

But I would use JREPL.BAT - a purely script based utility (hybrid JScript/batch) that works on any Windows machine from XP onward. It uses ECMA regular expressions, so no look-behind, but it has plenty of power to do the task.

jrepl "Product\s(\S+?)\s*\(Shop ID:\s(.*?)\)\q$" "$&;\q$1\q;\q$2\q" /a /x /f test.csv /o -

The /a switch discards unchanged lines, which effectively removes the header line. The /o - option overwrites the original file with the output. The /x switch enables extended escape sequences, thus enabling \q for ".

Use call jrepl if you put the command in a batch script.

Full documentation is available from the command line via jrepl /?, or jrepl /?? for paged output.

查看更多
Animai°情兽
3楼-- · 2019-09-09 22:27

This problem may be solved in a very simple way without a regex with this Batch file:

@echo off

(for /F "skip=1 tokens=1-3 delims=;" %%a in (input.csv) do (
   for /F "tokens=3,6 delims=() " %%d in (%%c) do (
      echo %%a;%%b;%%c;"%%d";"%%e"
   )
)) > output.txt
move /Y output.csv input.csv

Result:

"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

However, if there are lines that have not the format of the example data (that could be correctly processed with a regex, but not with this code), then an adjustment in this code may be required. Note that depending on the differences in the data, the problem may not be solved via a pure Batch file.

查看更多
混吃等死
4楼-- · 2019-09-09 22:29

You can do it with this GNU sed command:

sed -r 's/^.*Product (.+) \(Shop ID: (.+)\)"$/&;\"\1\";\"\2\"/g' shop.csv
  • it captures the parts between Product, (Shop ID: and )" into \1 and \2
  • the replacement uses & (the whole line) and appends a string made up of \1 and \2
查看更多
登录 后发表回答