I have a file containing two columns of text. Using a batch file, I would like to extract the second column of text and get the string length then write the string length and the string text to an output file. The step that challenges me is determining the string length which has special characters. For example, the input file looks like:
escitalopram CN(C)CCC[C@@]1(C2=C(CO1)C=C(C=C2)C#N)C3=CC=C(C=C3)F ibuprofen CC(C)CC1=CC=C(C=C1)C(C)C(=O)O keflex CC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)[C@@H](C3=CC=CC=C3)N)SC1)C(=O)O aspirin CC(=O)OC1=CC=CC=C1C(=O)O linoleic_acid CCCCC/C=C\C/C=C\CCCCCCCC(=O)O
I can read the file extracting the two tokens using a batch command line and argument %1. I have tried a few of the subroutines I found in the discussion groups but I can not get them to work. The "=" sign and perhaps the other special characters cause problems. I looking for a solution that would produce an output file like. ignoring the "@","/" and "\" signs:
escitalopram 49 ibuprofen 29 keflex 58 aspirin 24 linoleic_acid 25
My program thus far looks like:
@echo off
setLocal EnableDelayedExpansion enableextensions
set arg1=%1
FOR /F "tokens=1,2 delims= " %%r IN (%1) DO (
set teststring="%%s"
echo "Passing " %%s
call :GetStrLength %%s
echo.%%s
goto :EOF
)
::========================
:GetStrLength
setlocal enableextensions
set s=%1
echo " counting.... " %1
:: Get the length of the quoted string assuming a max of 255
set charCount=0
for /l %%c in (0,1,255) do (
set si=!s:~%%c!
if defined si set /a charCount+=1)
if %charCount% EQU 256 set charCount=0
echo The length of "%s%" is %charCount% characters
endlocal & goto :EOF
Any help would be appreciated.
You can use a strlen function, but you should use byre instead of byval parameters.
This function can handle any string and it needs always 13 loops to determine the length.
As a variable in batch can contain not more than 8191 characters this is enough.
To get the length of the string, I find the following method quite efficient.
The
=
causes problems because it is not quoted, and the batch parser treats=
as a token delimiter. When you pass an unquoted string containing=
as a parameter, the string is broken at each=
into multiple parameters. It should be possible to fix your code with the addition of some strategically placed quotes, as well as use of the~
parameter expansion modifier to remove enclosing quotes as needed. This is not a general solution, but it should work in your case because I don't think SMILES strings ever contain the"
character. Note that a quoted string containing quotes would contain some portion of the string that is effectively not quoted.Here is your code fixed. I've removed some of the unneccessary code and some of the diagnostic messages.
Below is a fully working script that computes the length of each SMILES string after removing the stereochemistry characters. (I'm curious why you want that value). It uses a corrected version of the very fast strlen function in jeb's answer. I added the USEBACKQ option to the intial FOR /F loop just in case a user passes a quoted file name that contains spaces.
I used a file named
q21817684.txt
for my testing. Yor data has a trailing space after the formula for keflex and aspirin. I eliminated that for my testing, but addingat the obvious point should be equivalent.