Powershell Replace Characters in a String

2019-07-07 14:01发布

问题:

Using powershell, but open to other potential solutions....

I have a long string. I need to replace several sequences of characters by position in that string with a mask character (period or space). I don't know what those characters are going to be, but I know they need to be something else. I have written code using mid and iterating through the string using mid and position numbers, but that is a bit cumbersome and wondering if there is a faster/more elegant method.

Example: Given the 2 strings:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
12345678901234567890123456

I want to replace characters 2-4, 8-9, 16-22, & 23 with ., yielding:

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

I can do that with a series of MID's, but I was just wanting to know if there were some sort of faster masking function to make this happen. I have to do this through millions of rows and second count.

回答1:

Try this:

$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
 '12345678901234567890123456') -Replace $regex,$replace

A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6

The -replace operator is slower than string.replace() for a single operation, but has the advantage of being able to operate on an array of strings, which is faster than the string method plus a foreach loop.

Here's a sample implementation (requires V4):

$regex =  [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'

filter fix-file {
 $_ -replace $regex,$replace | 
 add-content "c:\mynewfiles\$($file.name)"
}

get-childitem c:\myfiles\*.txt -PipelineVariable file |
 get-content -ReadCount 1000 | fix-file 

If you want to use the mask method, you can generate $regex and $replace from that:

$mask  = '-...----..------.....---.-'

 $regex = [regex]($mask -replace '(-+)','($1)').replace('-','.')

 $replace = 
 ([char[]]($mask -replace '-+','-') |
  foreach {$i=1}{if ($_ -eq '.'){$_} else {'$'+$i++}} {}) -join ''

$regex.ToString()
$replace

(.)...(....)..(......).....(...).(.)
$1...$2..$3.....$4.$5


回答2:

Here another approach:

C:\PS> $mask ="-...----..------.....---.-"
C:\PS> ([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | % {$i=0}{if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''

A...EFGH..KLMNOP.....VWX.Z

And if we are going to take advantage of V4 features :-), try this:

C:\PS> $i=0;([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ').Foreach({if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''


回答3:

Here yet another approach:

C:\PS> $mask = "{0}...{4}{5}{6}{7}..{10}{11}{12}{13}{14}{15}.....{21}{22}{23}.{25}"
C:\PS> $singlecharstrings = [string[]][char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C:\PS> $mask -f $singlecharstrings

A...EFGH..KLMNOP.....VWX.Z