I'd like to split my string to array but it works bad when last "value" is empty. See my example please. Is it bug or feature? Is there any way how to use this function without workarounds?
var
arr: TArray<string>;
arr:='a;b;c'.Split([';']); //length of array = 3, it's OK
arr:='a;b;c;'.Split([';']); //length of array = 3, but I expect 4
arr:='a;b;;c'.Split([';']); //length of array = 4 since empty value is inside
arr:=('a;b;c;'+' ').Split([';']); //length of array = 4 (primitive workaround with space)
This behaviour can't be changed. There's no way for you to customise how this split function works. I suspect that you'll need to provide your own split implementation. Michael Erikkson helpfully points out in a comment that System.StrUtils.SplitString
behaves in the manner that you desire.
The design seems to me to be poor. For instance
Length('a;'.Split([';'])) = 1
and yet
Length(';a'.Split([';'])) = 2
This asymmetry is a clear indication of poor design. It's astonishing that testing did not identify this.
The fact that the design is so clearly suspect means that it may be worth submitting a bug report. I'd expect it to be denied since any change would impact existing code. But you never know.
My recommendations:
- Use your own split implementation that performs as you require.
- Submit a bug report.
Whilst System.StrUtils.SplitString
does what you want, its performance is not great. That very likely does not matter. In which case you should use it. However, if performance matters, then I offer this:
{$APPTYPE CONSOLE}
uses
System.SysUtils, System.Diagnostics, System.StrUtils;
function MySplit(const s: string; Separator: char): TArray<string>;
var
i, ItemIndex: Integer;
len: Integer;
SeparatorCount: Integer;
Start: Integer;
begin
len := Length(s);
if len=0 then begin
Result := nil;
exit;
end;
SeparatorCount := 0;
for i := 1 to len do begin
if s[i]=Separator then begin
inc(SeparatorCount);
end;
end;
SetLength(Result, SeparatorCount+1);
ItemIndex := 0;
Start := 1;
for i := 1 to len do begin
if s[i]=Separator then begin
Result[ItemIndex] := Copy(s, Start, i-Start);
inc(ItemIndex);
Start := i+1;
end;
end;
Result[ItemIndex] := Copy(s, Start, len-Start+1);
end;
const
InputString = 'asdkjhasd,we1324,wqweqw,qweqlkjh,asdqwe,qweqwe,asdasdqw';
var
i: Integer;
Stopwatch: TStopwatch;
const
Count = 3000000;
begin
Stopwatch := TStopwatch.StartNew;
for i := 1 to Count do begin
InputString.Split([',']);
end;
Writeln('string.Split: ', Stopwatch.ElapsedMilliseconds);
Stopwatch := TStopwatch.StartNew;
for i := 1 to Count do begin
System.StrUtils.SplitString(InputString, ',');
end;
Writeln('StrUtils.SplitString: ', Stopwatch.ElapsedMilliseconds);
Stopwatch := TStopwatch.StartNew;
for i := 1 to Count do begin
MySplit(InputString, ',');
end;
Writeln('MySplit: ', Stopwatch.ElapsedMilliseconds);
end.
The output of a 32 bit release build with XE7 on my E5530 is:
string.Split: 2798
StrUtils.SplitString: 7167
MySplit: 1428
The following is very similar to the accepted answer but i) it is a helper method and ii) it accepts an array of separators.
The method takes about 30% longer than David's for these reasons, but may be useful anyway.
program ImprovedSplit;
{$APPTYPE CONSOLE}
uses
System.SysUtils;
type
TStringHelperEx = record helper for string
public
function SplitEx(const Separator: array of Char): TArray<string>;
end;
var
TestString : string;
StringArray : TArray<String>;
{ TStringHelperEx }
function TStringHelperEx.SplitEx( const Separator: array of Char ): TArray<string>;
var
Str : string;
Buf, Token : PChar;
i, cnt : integer;
sep : Char;
begin
cnt := 0;
Str := Self;
Buf := @Str[1];
SetLength(Result, 0);
if Assigned(Buf) then begin
for sep in Separator do begin
for i := 0 to Length(Self) do begin
if Buf[i] = sep then begin
Buf[i] := #0;
inc(cnt);
end;
end;
end;
SetLength(Result, cnt + 1);
Token := Buf;
for i := 0 to cnt do begin
Result[i] := StrPas(Token);
Token := Token + Length(Token) + 1;
end;
end;
end;
begin
try
TestString := '';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 0, 'Failed test for Empty String');
TestString := 'a';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 1, 'Failed test for Single String');
TestString := ';';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 2, 'Failed test for Single Separator');
TestString := 'a;';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 2, 'Failed test for Single String + Single End-Separator');
TestString := ';a';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 2, 'Failed test for Single String + Single Start-Separator');
TestString := 'a;b;c';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 3, 'Failed test for Simple Case');
TestString := ';a;b;c;';
StringArray := TestString.SplitEx([';']);
Assert(Length(StringArray) = 5, 'Failed test for Start and End Separator');
TestString := '0;1;2;3;4;5;6;7;8;9;0;1;2;3;4;5;6;7;8;9;0;1;2;3;4;5;6;7;8;9;0;1;2;3;4;5;6;7;8;9';
StringArray := TestString.SplitEx([';', ',']);
Assert(Length(StringArray) = 40, 'Failed test for Larger Array');
TestString := '0;1;2;3;4;5;6;7;8;9;0;1;2;3;4;5;6;7;8;9;0,1,2,3,4,5,6,7,8,9,0;1;2;3;4;5;6;7;8;9';
StringArray := TestString.SplitEx([';', ',']);
Assert(Length(StringArray) = 40, 'Failed test for Array of Separators');
Writeln('No Errors');
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
Writeln('Press ENTER to continue');
Readln(TestString);
end.