batch get string length with special characters

2019-08-05 08:56发布

I have a file containing two columns of text. Using a batch file, I would like to extract the second column of text and get the string length then write the string length and the string text to an output file. The step that challenges me is determining the string length which has special characters. For example, the input file looks like:

escitalopram CN(C)CCC[C@@]1(C2=C(CO1)C=C(C=C2)C#N)C3=CC=C(C=C3)F
ibuprofen CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
keflex CC1=C(N2[C@@H]([C@@H](C2=O)NC(=O)[C@@H](C3=CC=CC=C3)N)SC1)C(=O)O 
aspirin CC(=O)OC1=CC=CC=C1C(=O)O 
linoleic_acid CCCCC/C=C\C/C=C\CCCCCCCC(=O)O

I can read the file extracting the two tokens using a batch command line and argument %1. I have tried a few of the subroutines I found in the discussion groups but I can not get them to work. The "=" sign and perhaps the other special characters cause problems. I looking for a solution that would produce an output file like. ignoring the "@","/" and "\" signs:

escitalopram 49
ibuprofen 29 
keflex 58 
aspirin 24
linoleic_acid 25 

My program thus far looks like:

@echo off
setLocal EnableDelayedExpansion enableextensions


set arg1=%1

FOR /F "tokens=1,2 delims= " %%r IN (%1) DO (
set teststring="%%s"
echo "Passing     " %%s
call :GetStrLength %%s
echo.%%s
goto :EOF
)
  ::========================
  :GetStrLength
  setlocal enableextensions

set s=%1
echo " counting.... " %1

:: Get the length of the quoted string assuming a max of 255
set charCount=0
for /l %%c in (0,1,255) do (
  set si=!s:~%%c!
  if defined si set /a charCount+=1)
if %charCount% EQU 256 set charCount=0
echo The length of "%s%" is %charCount% characters
endlocal & goto :EOF

Any help would be appreciated.

4条回答
我命由我不由天
2楼-- · 2019-08-05 09:20

You can use a strlen function, but you should use byre instead of byval parameters.

This function can handle any string and it needs always 13 loops to determine the length.
As a variable in batch can contain not more than 8191 characters this is enough.

echo off
set "myString=Any content"
call :strlen result myString
echo %result%
exit /b

:strlen <resultVar> <stringVar>
(   
    setlocal EnableDelayedExpansion
    set "s=!%~2!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!s:~%%P,1!" NEQ "" ( 
            set /a "len+=%%P"
            set "s=!s:~%%P!"
        )
    )
)
( 
    endlocal
    set "%~1=%len%"
    exit /b
)
查看更多
Fickle 薄情
3楼-- · 2019-08-05 09:21

To get the length of the string, I find the following method quite efficient.

@echo off
setLocal EnableDelayedExpansion

set s=%*
set length=0

:count
if defined s (
    if "!s:~0,1!" NEQ "@" if "!s:~0,1!" NEQ "/" if "!s:~0,1!" NEQ "\" set /A length += 1
    set "s=%s:~1%"
    goto count
)

echo %length%
查看更多
别忘想泡老子
4楼-- · 2019-08-05 09:29

The = causes problems because it is not quoted, and the batch parser treats = as a token delimiter. When you pass an unquoted string containing = as a parameter, the string is broken at each = into multiple parameters. It should be possible to fix your code with the addition of some strategically placed quotes, as well as use of the ~ parameter expansion modifier to remove enclosing quotes as needed. This is not a general solution, but it should work in your case because I don't think SMILES strings ever contain the " character. Note that a quoted string containing quotes would contain some portion of the string that is effectively not quoted.

Here is your code fixed. I've removed some of the unneccessary code and some of the diagnostic messages.

@echo off
setlocal

FOR /F "tokens=1,2 delims= " %%r IN (%1) DO (
  echo Passing     "%%s"
  call :GetStrLength "%%s"
  goto :EOF
)

::========================
:GetStrLength
setlocal enableDelayedExpansion

set "s=%~1"
echo counting.... %1

:: Get the length of the quoted string assuming a max of 255
set charCount=0
for /l %%c in (0,1,255) do (
  set si=!s:~%%c!
  if defined si set /a charCount+=1
)
if %charCount% EQU 256 set charCount=0
echo The length of "%s%" is %charCount% characters
endlocal & goto :EOF

Below is a fully working script that computes the length of each SMILES string after removing the stereochemistry characters. (I'm curious why you want that value). It uses a corrected version of the very fast strlen function in jeb's answer. I added the USEBACKQ option to the intial FOR /F loop just in case a user passes a quoted file name that contains spaces.

@echo off
setlocal enableDelayedExpansion

for /f "usebackq tokens=1,2 delims= " %%A IN (%1) do (
  set "SMILES=%%B"
  for %%C in (@ / \) do set "SMILES=!SMILES:%%C=!"
  call :strlen len SMILES
  echo %%A !len!
)
exit /b

:strlen <resultVar> <stringVar>
setlocal enableDelayedExpansion
set "s=!%~2!#"
set "len=0"
for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
  if "!s:~%%P,1!" NEQ "" (
    set /a "len+=%%P"
    set "s=!s:~%%P!"
  )
)
endlocal&set "%~1=%len%"
exit /b
查看更多
叼着烟拽天下
5楼-- · 2019-08-05 09:29
@ECHO OFF
SETLOCAL
FOR /f "tokens=1*delims= " %%a IN (q21817684.txt) DO (
 SET /a count=0
 SET "chemical=%%a"
 SET "formula=%%b"
 CALL :report
)
GOTO :EOF

:report
SET "formula=%formula:@=%"
SET "formula=%formula:\=%"
SET "formula=%formula:/=%"
:reportl
IF DEFINED formula (
 SET "formula=%formula:~1%"
 SET /a count +=1
 GOTO reportl
)
ECHO %chemical% %count%

GOTO :eof

I used a file named q21817684.txt for my testing. Yor data has a trailing space after the formula for keflex and aspirin. I eliminated that for my testing, but adding

SET "formula=%formula: =%"

at the obvious point should be equivalent.

查看更多
登录 后发表回答