Search & Replace Text within CSV fields using Batc

2019-09-22 14:46发布

I have a semicolon ";" delimited CSV file with "" as Text Quantifier, however there are fields which have ";" or "" which break the lines; How can I use a batch script to replace such values in each field each row, but keep the Field delimiter (;) and Text Quantifier ("") the same? (Example Replace ";" in each field with "|" and Double-Quotes Single-Quotes)

Note: We can Rely on the ";" part between each two fields (Start and End of each field has the double-quotes, possible to use it as imaginary delimiter in the solution)

Here as an example of my csv rows with corrupted Fields:

"Event";"User";"Description"   
"stock_change";"usertest1@gmail.com";"Change Product Teddy;Bear (Shop ID: "AR832H0823")"
"stock_update;change";"usertest2@gmail.com";"Update Product "30142_Pen" (Shop ID: GI8759)"

2条回答
虎瘦雄心在
2楼-- · 2019-09-22 15:08
@ECHO Off
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q35828741.txt"
SET "outfile=%destdir%\outfile.txt"
FOR /L %%f IN (1,1,3) DO SET "field%%f="
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO (
 FOR %%b IN (%%a) DO CALL :process %%b
)
)>"%outfile%"

GOTO :EOF

:process
IF NOT DEFINED field1 SET "field1=%~1"&GOTO :EOF 
IF NOT DEFINED field2 SET "field2=%~1"&GOTO :EOF 
SET "field3=%~1"
:repcwp
FOR /f "tokens=1*delims=:" %%f IN ("%field3%") DO (
 SET "field3=%%g"
 IF DEFINED field3 (SET "field3=%%f''%%g"&GOTO repcwp) ELSE (SET "field3=%%~f")
)
set "field1=%field1:;=|%"
set "field1=%field1:"='%"
set "field2=%field2:;=|%"
set "field2=%field2:"='%"
set "field3=%field3:;=|%"
set "field3=%field3:"='%"
ECHO "%field1:''=:%";"%field2:''=:%";"%field3:''=:%"
FOR /L %%f IN (1,1,3) DO SET "field%%f="
GOTO :eof

You would need to change the settings of sourcedir and destdir to suit your circumstances.

I used a file named q35828741.txt containing your data for my testing.

Produces the file defined as %outfile%

Process each line of the file, presuming it is well-constructed.

Use a simple for loop to deliver the three fields to the procedure :process. The lines are each of the form "data1"separator"data2"separator"data3"

Within :process, accumulate the data to field1..3

Since the common substring-replace mechanism uses : to separate the "to" and "from" strings, replace each : with a distinct string ''. This is only done for field3 since it appears from the sample data that it is the only field that may contain colons. If colons may appear in the other fields, it's simply a matter of following the bouncing ball.

Having replaced all the colons, replace the semicolons and rabbit's-ears as required, then in the echo which outputs the data to the destination file, replace any '' with colon.

This makes a number of assumptions, including that the data contains no % or other awkward characters and that there are no instances of :: in the data.

查看更多
Anthone
3楼-- · 2019-09-22 15:24

I don't understand why you would want to convert teddy;bear to teddy|bear, but... OK.

As requested in comment at https://stackoverflow.com/a/35822437/1012053, you can use the /T option of my JREPL.BAT utility to perform the following find/replace (earlier find/replace take precedence):

  • " at beginning of line, or ";" anywhere, or " at end of line ==> leave as us
  • " any place else ==> convert to '
  • ; any place else ==> convert to |
jrepl "^\q|\q;\q|\q$ \q ;" "$& ' |" /x /t " " /f test.csv /o -
查看更多
登录 后发表回答