I found a Windows API function that performs "natural comparison" of strings. It is defined as follows:
int StrCmpLogicalW(
LPCWSTR psz1,
LPCWSTR psz2
);
To use it in Delphi, I declared it this way:
interface
function StrCmpLogicalW(psz1, psz2: PWideChar): integer; stdcall;
implementation
function StrCmpLogicalW; external 'shlwapi.dll' name 'StrCmpLogicalW';
Because it compares Unicode strings, I'm not sure how to call it when I want to compare ANSI strings. It seems to be enough to cast strings to WideString and then to PWideChar, however, I have no idea whether this approach is correct:
function AnsiNaturalCompareText(const S1, S2: string): integer;
begin
Result := StrCmpLogicalW(PWideChar(WideString(S1)), PWideChar(WideString(S2)));
end;
I know very little about character encoding so this is the reason of my question. Is this function OK or should I first convert both the compared strings somehow?
Use
System.StringToOleStr
, which is a handy wrapper aroundMultiByteToWideChar
, see Gabr's answer:But then, Ian Boyd's solution looks and is much nicer!
The easier way to accomplish the task would be to declare your function as:
Because a
WideString
variable is a pointer to aWideChar
(in the same way anAnsiString
variable is a pointer to anAnsiChar
.)And this way Delphi will automatically "up-convert" an AnsiString to a
WideString
for you.Update
And since we're now in the world of
UnicodeString
, you would make it:Because a
UnicodeString
variable is still a pointer to a\0\0
terminated string ofWideChars
. So if you call:When you try to pass an
AnsiString
into a function that takes aUnicodeString
, the compiler will automatically callMultiByteToWideChar
for you in the generated code.CompareString supports numeric sorting in Windows 7
Starting in Windows 7, Microsoft added
SORT_DIGITSASNUMBERS
toCompareString
:None of this helps answer the actual question, which deals with when you have to convert or cast strings.
Keep in mind that casting a string to a WideString will convert it using default system codepage which may or may not be what you need. Typically, you'd want to use current user's locale.
From
WCharFromChar
in System.pas:You can change DefaultSystemCodePage by calling SetMultiByteConversionCodePage.
There might be an ANSI variant for your function to (I haven't checked). Most Wide API's are available as an ANSI version too, just change the W suffix to an A, and you're set. Windows does the back-and-forth conversion transparantly for you in that case.
PS: Here's an article describing the lack of StrCmpLogicalA : http://blogs.msdn.com/joshpoley/archive/2008/04/28/strcmplogicala.aspx