This question has been asked a lot on stackoverflow, but I can't seem to be able to make it work. Any hints appreciated. Here is a text file (extension .mpl) containing offending text that needs to be removed:
plotsetup('ps', 'plotoutput = "plotfile.eps"', 'plotoptions' = "color=rgb,landscape,noborder");
print(PLOT3D(MESH(Array(1..60, 1..60, 1..3, [[[.85840734641021,0.,-0.],
[HFloat(undefined),HFloat(undefined),HFloat(undefined)],[.857971665313419,.0917163905694189,-.16720239349226],
... more like that ...
[.858407346410207,-3.25992468340355e-015,5.96532373555817e-015]]], datatype = float[8], order = C_order)),SHADING(ZHUE),STYLE(PATCHNOGRID),TRANSPARENCY(.3),LIGHTMODEL(LIGHT_4),ORIENTATION(35.,135.),SCALING(CONSTRAINED),AXESSTYLE(NORMAL)));
I want to remove every instance of:
[HFloat(undefined),HFloat(undefined),HFloat(undefined)],
and there are thousands such instances!. Note: the square brackets and the comma are to be removed. There is no space, so I have pages and pages of:
[HFloat(undefined),HFloat(undefined),HFloat(undefined)],
[HFloat(undefined),HFloat(undefined),HFloat(undefined)],
[HFloat(undefined),HFloat(undefined),HFloat(undefined)],
I won't list here all my failed attempts. Below is the closest I've come:
@echo off
SetLocal
cd /d %~dp0
if exist testCleaned.mpl del testCleaned.mpl
SetLocal EnableDelayedExpansion
Set OldString=[HFloat(undefined),HFloat(undefined),HFloat(undefined)],
Set NewString=
pause
FOR /F "tokens=* delims= " %%I IN (test.mpl) DO (
set str=%%I
set str=!str:OldString=NewString!
echo !str! >> testCleaned.mpl
endlocal
)
EndLocal
The above was strung together, as it were, from pieces of code found on the web, especially at stackoverflow, e.g. Problem with search and replace batch file
What it does is produce a truncated file, as follows:
plotsetup('ps', 'plotoutput = "plotfile.eps"', 'plotoptions' = "color=rgb,landscape,noborder");
!str!
Please don't hesitate to request clarifications. Apologies if you feel that this question has already been answered. I would very much appreciate if you would copy-paste the relevant code for me, as I have tried for several hours.
Bonus: can this automatic naming be made to work? "%%~nICleaned.mpl
"
The biggest problem with your existing code is the SetLocal enableDelayedExpansion
is missplaced - it should be within the loop after set str=%%I
.
Other problems:
- will strip lines beginning with ;
- will strip leading spaces from each line
- will strip blank (empty) lines
- will print
ECHO is off
if any lines becomes empty or contains only spaces after substitution
- will add extra space at end of each line (didn't notice this until I read jeb's answer)
Optimization issue - using >>
can be relatively slow. It is faster to enclose the whole loop in () and then use >
Below is about the best you can do with Windows batch. I auto named the output as requested, doing one better - It automatically preserves the extension of the original name.
@echo off
SetLocal
cd /d %~dp0
Set "OldString=[HFloat(undefined),HFloat(undefined),HFloat(undefined)],"
Set "NewString="
set file="test.mpl"
for %%F in (%file%) do set outFile="%%~nFCleaned%%~xF"
pause
(
for /f "skip=2 delims=" %%a in ('find /n /v "" %file%') do (
set "ln=%%a"
setlocal enableDelayedExpansion
set "ln=!ln:*]=!"
if defined ln set "ln=!ln:%OldString%=%NewString%!"
echo(!ln!
endlocal
)
)>%outFile%
Known limitations
- limited to slightly under 8k per line, both before and after substitution
- search string cannot include
=
or !
, nor can it start with *
or ~
- replacement string cannot include
!
- search part of search and replace is case insensitive
- last line will always end with newline
<CR><LF>
even if original did not
All but the first limitation could be eliminated, but it would require a lot of code, and would be horrifically slow. The solution would require a character by character search of each line. The last limitation would require some awkward test to determine if the last line was newline terminated, and then last line would have to be printed using <nul SET /P "ln=!ln!"
trick if no newline wanted.
Interesting feature (or limitation, depending on perspective)
- Unix style files ending lines with
<LF>
will be converted to Windows style with lines ending with <CR><LF>
There are other solutions using batch that are significantly faster, but they all have more limitations.
Update - I've posted a new pure batch solution that is able to do case sensitive searches and has no restrictions on search or replacement string content. It does have more restrictions on line length, trailing control characters, and line format. Performance is not bad, especially if the number of replacements is low. http://www.dostips.com/forum/viewtopic.php?f=3&t=2710
Addendum
Based on comments below, a batch solution will not work for this particular problem because of line length limitation.
But this code is a good basis for a batch based search and replace utility, as long as you are willing to put up with the limitations and relatively poor performance of batch.
There are much better text processing tools available, though they are not standard with Windows. My favorite is sed within the GNU Utilities for Win32 package. The utilities are free, and do not require any installation.
Here is a sed solution for Windows using GNU utilities
@echo off
setlocal
cd /d %~dp0
Set "OldString=\[HFloat(undefined),HFloat(undefined),HFloat(undefined)\],"
Set "NewString="
set file="test.mpl"
for %%F in (%file%) do set outFile="%%~nFCleaned%%~xF"
pause
sed -e"s/%OldString%/%NewString%/g" <%file% >%outfile%
Update 2013-02-19
sed may not be an option if you work at a site that has rules forbidding the installation of executables downloaded from the web.
JScript has good regular expression handling, and it is standard on all modern Windows platforms, including XP. It is a good choice for performing search and replace operations on Windows platforms.
I have written a hybrid JScript/Batch search and replace script (REPL.BAT) that is easy to call from a batch script. A small amount of code gives a lot of powerful features; not as powerful as sed, but more than enough to handle this task, as well as many others. It is also quite fast, much faster than any pure batch solution. It also does not have any inherent line length limitations.
Here is a batch script that uses my REPL.BAT utility to accomplish the task.
@echo off
setlocal
cd /d %~dp0
Set "OldString=[HFloat(undefined),HFloat(undefined),HFloat(undefined)],"
Set "NewString="
set file="test.txt"
for %%F in (%file%) do set outFile="%%~nFCleaned%%~xF"
pause
call repl OldString NewString le <%file% >%outfile%
I use the L
option to specify a literal search string instead of a regular expression, and the E
option to pass the search and replace strings via environment variables by name, instead of using string literals on the command line.
Here is the REPL.BAT utility script that the above code calls. Full documentation is encluded within the script.
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
::************ Documentation ***********
:::
:::REPL Search Replace [Options [SourceVar]]
:::REPL /?
:::
::: Performs a global search and replace operation on each line of input from
::: stdin and prints the result to stdout.
:::
::: Each parameter may be optionally enclosed by double quotes. The double
::: quotes are not considered part of the argument. The quotes are required
::: if the parameter contains a batch token delimiter like space, tab, comma,
::: semicolon. The quotes should also be used if the argument contains a
::: batch special character like &, |, etc. so that the special character
::: does not need to be escaped with ^.
:::
::: If called with a single argument of /? then prints help documentation
::: to stdout.
:::
::: Search - By default this is a case sensitive JScript (ECMA) regular
::: expression expressed as a string.
:::
::: JScript syntax documentation is available at
::: http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
::: Replace - By default this is the string to be used as a replacement for
::: each found search expression. Full support is provided for
::: substituion patterns available to the JScript replace method.
::: A $ literal can be escaped as $$. An empty replacement string
::: must be represented as "".
:::
::: Replace substitution pattern syntax is documented at
::: http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
::: Options - An optional string of characters used to alter the behavior
::: of REPL. The option characters are case insensitive, and may
::: appear in any order.
:::
::: I - Makes the search case-insensitive.
:::
::: L - The Search is treated as a string literal instead of a
::: regular expression. Also, all $ found in Replace are
::: treated as $ literals.
:::
::: E - Search and Replace represent the name of environment
::: variables that contain the respective values. An undefined
::: variable is treated as an empty string.
:::
::: M - Multi-line mode. The entire contents of stdin is read and
::: processed in one pass instead of line by line. ^ anchors
::: the beginning of a line and $ anchors the end of a line.
:::
::: X - Enables extended substitution pattern syntax with support
::: for the following escape sequences:
:::
::: \\ - Backslash
::: \b - Backspace
::: \f - Formfeed
::: \n - Newline
::: \r - Carriage Return
::: \t - Horizontal Tab
::: \v - Vertical Tab
::: \xnn - Ascii (Latin 1) character expressed as 2 hex digits
::: \unnnn - Unicode character expressed as 4 hex digits
:::
::: Escape sequences are supported even when the L option is used.
:::
::: S - The source is read from an environment variable instead of
::: from stdin. The name of the source environment variable is
::: specified in the next argument after the option string.
:::
::************ Batch portion ***********
@echo off
if .%2 equ . (
if "%~1" equ "/?" (
findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
exit /b 0
) else (
call :err "Insufficient arguments"
exit /b 1
)
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
call :err "Invalid option(s)"
exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b
************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
options=options.replace(/e/g,"");
search=env(search);
replace=env(replace);
}
if (options.indexOf("l")>=0) {
options=options.replace(/l/g,"");
search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
options=options.replace(/x/g,"");
replace=replace.replace(/\\\\/g,"\\B");
replace=replace.replace(/\\b/g,"\b");
replace=replace.replace(/\\f/g,"\f");
replace=replace.replace(/\\n/g,"\n");
replace=replace.replace(/\\r/g,"\r");
replace=replace.replace(/\\t/g,"\t");
replace=replace.replace(/\\v/g,"\v");
replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
function($0,$1,$2){
return String.fromCharCode(parseInt("0x"+$0.substring(2)));
}
);
replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);
if (srcVar) {
WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
while (!WScript.StdIn.AtEndOfStream) {
if (multi) {
WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
} else {
WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
}
}
}
The Batch file below have the same restrictions of previous solutions on characters that can be processed; these restrictions are inherent to all Batch language programs. However, this program should run faster if the file is large and the lines to replace are not too much. Lines with no replacement string are not processed, but directly copied to the output file.
@echo off
setlocal EnableDelayedExpansion
set "oldString=[HFloat(undefined),HFloat(undefined),HFloat(undefined)],"
set "newString="
findstr /N ^^ inFile.mpl > numberedFile.tmp
find /C ":" < numberedFile.tmp > lastLine.tmp
set /P lastLine=<lastLine.tmp
del lastLine.tmp
call :ProcessLines < numberedFile.tmp > outFile.mpl
del numberedFile.tmp
goto :EOF
:ProcessLines
set lastProcessedLine=0
for /F "delims=:" %%a in ('findstr /N /C:"%oldString%" inFile.mpl') do (
call :copyUpToLine %%a
echo(!line:%oldString%=%newString%!
)
set /A linesToCopy=lastLine-lastProcessedLine
for /L %%i in (1,1,%linesToCopy%) do (
set /P line=
echo(!line:*:=!
)
exit /B
:copyUpToLine number
set /A linesToCopy=%1-lastProcessedLine-1
for /L %%i in (1,1,%linesToCopy%) do (
set /P line=
echo(!line:*:=!
)
set /P line=
set line=!line:*:=!
set lastProcessedLine=%1
exit /B
I would appreciate if you may run a timing test on this an other solutions and post the results.
EDIT: I changed the set /A lastProcessedLine+=linesToCopy+1
line for the equivalent, but faster set lastProcessedLine=%1
.
I'm no expert on batch files, so I can't offer a direct solution to your problem.
However, to solve your problem, it might be simpler to use an alternative to batch files.
For example, I'd recommend using http://www.csscript.net/ (if you know C#). This tool will allow you to run C# files like batch files, but giving you the power to write your script using C#, instead of horrible batch file syntax :)
Another alternative would be python, if you know python.
But I guess the point is, that this kind of task may be easier in another programming language.
You defined delims=<space>
, that's a bad idea if you want to preserve your lines, as it splits after the first space.
You should change this to FOR /F "tokens=* delims=" ...
.
Your echo !str! >> testCleaned.mpl
will always append one extra space to each line, better use echo(!str!>>testCleaned.mpl
.
You will also lose all empty lines, and all exclamation marks in all lines.
You could also try the code of Improved BatchSubstitute.bat