Add Regex Matches as new columns to the csv file [

I have a .csv file that I need to add regex matches in each line as new columns after the original columns, here is a part of the .csv file:

"Event";"User";"Description"   
"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)"
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)"

Here is the two Regex Patterns I want to add their extracted results from each row as new columns (one column for each)

(?<=Product\s)\w.*?(?=\s*\(Shop)

(?<=Shop ID:\s)\w.*?(?=\))

The Result on the data should be Like this (Header Row is not important):

"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"  
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

Sorry I'm very basic in Batch Scripting, thanks in advance

标签： regex csv batch-file command-line

3条回答

在下西门庆

2楼-- · 2019-09-09 22:26

Windows batch does not have a native regex find/replace utility. The only regex utility is FINDSTR, and that is extremely limited and non-standard, and it can only print out entire lines that match the search - it cannot print out just the matching portion.

You could use PowerShell.

But I would use JREPL.BAT - a purely script based utility (hybrid JScript/batch) that works on any Windows machine from XP onward. It uses ECMA regular expressions, so no look-behind, but it has plenty of power to do the task.

jrepl "Product\s(\S+?)\s*\(Shop ID:\s(.*?)\)\q$" "$&;\q$1\q;\q$2\q" /a /x /f test.csv /o -

The /a switch discards unchanged lines, which effectively removes the header line. The /o - option overwrites the original file with the output. The /x switch enables extended escape sequences, thus enabling \q for ".

Use call jrepl if you put the command in a batch script.

Full documentation is available from the command line via jrepl /?, or jrepl /?? for paged output.

0人赞添加讨论(0) 举报

Animai°情兽

3楼-- · 2019-09-09 22:27

This problem may be solved in a very simple way without a regex with this Batch file:

@echo off

(for /F "skip=1 tokens=1-3 delims=;" %%a in (input.csv) do (
   for /F "tokens=3,6 delims=() " %%d in (%%c) do (
      echo %%a;%%b;%%c;"%%d";"%%e"
   )
)) > output.txt
move /Y output.csv input.csv

Result:

"stock_change";"usertest1@gmail.com";"Change Product Teddy-Bear (Shop ID: AR832H0823)";"Teddy-Bear";"AR832H0823"
"stock_update";"usertest2@gmail.com";"Update Product 30142_Pen (Shop ID: GI8759)";"30142_Pen";"GI8759"

However, if there are lines that have not the format of the example data (that could be correctly processed with a regex, but not with this code), then an adjustment in this code may be required. Note that depending on the differences in the data, the problem may not be solved via a pure Batch file.

0人赞添加讨论(0) 举报

混吃等死

4楼-- · 2019-09-09 22:29

You can do it with this GNU sed command:

sed -r 's/^.*Product (.+) \(Shop ID: (.+)\)"$/&;\"\1\";\"\2\"/g' shop.csv

it captures the parts between Product, (Shop ID: and )" into \1 and \2
the replacement uses & (the whole line) and appends a string made up of \1 and \2

0人赞添加讨论(0) 举报

Add Regex Matches as new columns to the csv file [

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间