I apologize for this silly question. I am maintaining old legacy VB6 code, and I have a function that actually works - but I simply can't figure out why it works, or why the code doesn't work without it.
Basically, this function reads a UTF-8 text file and displays its contents in a DHTMLEdit component. The way it goes about it, is that it reads the entire file into a string, then converts it from a double byte to a multibyte string using the ANSI codepage, then converts it back to double byte.
Using this entire elaborate mechanism causes the component to correctly display a page that has Hebrew, Arabic, Thai and Chinese, all at the same time. Not using this code makes the text look like it was converted down to ASCII, showing various punctuation marks where letters once were.
What I don't understand is:
- Since the original file is UTF-8 and VB6 strings are UTF-16, why is this even needed? Why doesn't VB6 read the string correctly from the file without all these conversions?
- If the function converts from widebyte to multibyte using CodePage = 0 (ANSI), wouldn't that eliminate any characters that are not supported by the current codepage? I don't even have Chinese, Thai and Arabic installed on this station. And yet this is the only way that I can get the DHTMLEdit control to display correctly.
[code]
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
Private Declare Function GetACP Lib "kernel32" () As Long
...
Open filePath For Input As #lFilePtr
Dim sInput as String
dim sResult as string
Do While Not EOF(lFilePtr)
Line Input #lFilePtr, sInput
sResult = sResult + sInput;
Loop
txtBody.DOM.Body.innerText = DecodeString(sResult, CP_UTF8);
Public Function DecodeString(ByVal strSource As String, Optional FromCodePage As Long = -1) As String
Dim strTemp As String
If strSource = vbNullString Then Exit Function
strTemp = UnicodeToAnsi(strSource, 0)
DecodeString = AnsiToUnicode(strTemp, FromCodePage)
End Function
Public Function AnsiToUnicode(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
Dim strBuffer As String
Dim cwch As Long
Dim pwz As Long
Dim pwzBuffer As Long
If codePage = -1 Then codePage = GetACP()
pwz = StrPtr(strSource)
cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, 0&, 0&)
strBuffer = String$(cwch + 1, vbNullChar)
pwzBuffer = StrPtr(strBuffer)
cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer))
AnsiToUnicode = Left(strBuffer, cwch - 1)
End Function
Public Function UnicodeToAnsi(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
Dim strBuffer As String
Dim cwch As Long
Dim pwz As Long
Dim pwzBuffer As Long
If codePage = -1 Then codePage = GetACP()
pwz = StrPtr(strSource)
cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, 0&, 0&, ByVal 0&, ByVal 0&)
strBuffer = String$(cwch + 1, vbNullChar)
pwzBuffer = StrPtr(strBuffer)
cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer), ByVal 0&, ByVal 0&)
UnicodeToAnsi = Left(strBuffer, cwch - 1)
End Function
[code]