How can a text file be converted from ANSI to UTF-

2019-02-16 21:45发布

问题:

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.

回答1:

The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.

Sample:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;


回答2:

Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.



回答3:

var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;


回答4:

Please read the whole answer before you start coding.


The proper answer to question - and it is not the easy one - basically consist of tree steps:

  1. You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
  2. You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
  3. You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.

However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.


The proper solution to your problem is ways more complicated:

If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.

It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the @ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.

I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)



回答5:

Did you mean ASCII?

ASCII is backwards compatible with UTF-8. http://en.wikipedia.org/wiki/UTF-8