Using powershell, but open to other potential solutions....
I have a long string. I need to replace several sequences of characters by position in that string with a mask character (period or space). I don't know what those characters are going to be, but I know they need to be something else. I have written code using mid and iterating through the string using mid and position numbers, but that is a bit cumbersome and wondering if there is a faster/more elegant method.
Example:
Given the 2 strings:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
12345678901234567890123456
I want to replace characters 2-4, 8-9, 16-22, & 23 with ., yielding:
A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6
I can do that with a series of MID's, but I was just wanting to know if there were some sort of faster masking function to make this happen. I have to do this through millions of rows and second count.
Try this:
$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'
('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'12345678901234567890123456') -Replace $regex,$replace
A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6
The -replace operator is slower than string.replace() for a single operation, but has the advantage of being able to operate on an array of strings, which is faster than the string method plus a foreach loop.
Here's a sample implementation (requires V4):
$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'
filter fix-file {
$_ -replace $regex,$replace |
add-content "c:\mynewfiles\$($file.name)"
}
get-childitem c:\myfiles\*.txt -PipelineVariable file |
get-content -ReadCount 1000 | fix-file
If you want to use the mask method, you can generate $regex and $replace from that:
$mask = '-...----..------.....---.-'
$regex = [regex]($mask -replace '(-+)','($1)').replace('-','.')
$replace =
([char[]]($mask -replace '-+','-') |
foreach {$i=1}{if ($_ -eq '.'){$_} else {'$'+$i++}} {}) -join ''
$regex.ToString()
$replace
(.)...(....)..(......).....(...).(.)
$1...$2..$3.....$4.$5
Here another approach:
C:\PS> $mask ="-...----..------.....---.-"
C:\PS> ([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | % {$i=0}{if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''
A...EFGH..KLMNOP.....VWX.Z
And if we are going to take advantage of V4 features :-), try this:
C:\PS> $i=0;([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ').Foreach({if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''
Here yet another approach:
C:\PS> $mask = "{0}...{4}{5}{6}{7}..{10}{11}{12}{13}{14}{15}.....{21}{22}{23}.{25}"
C:\PS> $singlecharstrings = [string[]][char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C:\PS> $mask -f $singlecharstrings
A...EFGH..KLMNOP.....VWX.Z