VB6/VBScript change file encoding to ansi

2019-01-09 12:58发布

问题:

I am looking for a way to convert a textfile with UTF8 encoding to ANSI encoding.

How can i go around and achieve this in Visual Basic (VB6) and or vbscript?

回答1:

If your files aren't truly enormous (e.g. even merely 40MB can be painfully slow) you can do this using the following code in VB6, VBA, or VBScript:

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
    Dim strText

    With CreateObject("ADODB.Stream")
        .Open
        .Type = adTypeBinary
        .LoadFromFile UTF8FName
        .Type = adTypeText
        .Charset = "utf-8"
        strText = .ReadText(adReadAll)
        .Position = 0
        .SetEOS
        .Charset = "_autodetect" 'Use current ANSI codepage.
        .WriteText strText, adWriteChar
        .SaveToFile ANSIFName, adSaveCreateOverWrite
        .Close
    End With
End Sub

UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName

Note that it will handle UTF-8 input files either with or without a BOM.

Using strong typing and early binding will improve performance a hair in VB6, and you won't need to declare those Const values. This isn't an option in script though.

For VB6 programs that need to process very large files you might be better off using VB6 native I/O against Byte arrays and use an API call to convert the data in chunks. This adds the extra messiness of finding the character boundaries though (UTF-8 uses a variable number of bytes per character). You'd need to scan each data block you read to find a safe ending point for an API translation.

I'd look at MultiByteToWideChar() and WideCharToMultiByte() to get started.

Note that UTF-8 often "arrives" with LF line delimiters instead of CRLF.



回答2:

I'm using these helper functions

Private Function pvReadFile(sFile)
    Const ForReading = 1
    Dim sPrefix

    With CreateObject("Scripting.FileSystemObject")
        sPrefix = .OpenTextFile(sFile, ForReading, False, False).Read(3)
    End With
    If Left(sPrefix, 3) <> Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
        With CreateObject("Scripting.FileSystemObject")
            pvReadFile = .OpenTextFile(sFile, ForReading, False, Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE)).ReadAll()
        End With
    Else
        With CreateObject("ADODB.Stream")
            .Open
            If Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE) Then
                .Charset = "Unicode"
            ElseIf Left(sPrefix, 3) = Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
                .Charset = "UTF-8"
            Else
                .Charset = "_autodetect"
            End If
            .LoadFromFile sFile
            pvReadFile = .ReadText
        End With
    End If
End Function

Private Function pvWriteFile(sFile, sText, lType)
    Const adSaveCreateOverWrite = 2

    With CreateObject("ADODB.Stream")
        .Open
        If lType = 2 Then
            .Charset = "Unicode"
        ElseIf lType = 3 Then
            .Charset = "UTF-8"
        Else
            .Charset = "_autodetect"
        End If
        .WriteText sText
        .SaveToFile sFile, adSaveCreateOverWrite
    End With
End Function

I found out that "native" FileSystemObject reading of ANSI and UTF-16/UCS-2 files is much faster that ADODB.Stream hack.



回答3:

I'm using this script to convert any character set or code page (that i'm aware of).

This script can also handle large files (over one gigabytes), because it streams one line at a time.

' - ConvertCharset.vbs -
'
' Inspired by: 
' http://www.vbforums.com/showthread.php?533879-Generate-text-files-in-IBM-850-encoding
' http://stackoverflow.com/questions/5182102/vb6-vbscript-change-file-encoding-to-ansii/5186170#5186170
' http://stackoverflow.com/questions/13130214/how-to-convert-a-batch-file-stored-in-utf-8-to-something-that-works-via-another
' 
' Start Main
Dim objArguments
Dim strSyntaxtext, strInputCharset, strOutputCharset, strInputFile, strOutputFile 
Dim intReadPosition, intWritePosition
Dim arrCharsets

Const adReadAll = -1
Const adReadLine = -2
Const adSaveCreateOverWrite = 2
Const adSaveCreateNotExist = 1
Const adTypeBinary = 1
Const adTypeText = 2
Const adWriteChar = 0
Const adWriteLine = 1

strSyntaxtext = strSyntaxtext & "Converts the charset of the input text file to output file." & vbCrLf
strSyntaxtext = strSyntaxtext & "Syntax: "  & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf
strSyntaxtext = strSyntaxtext & "              /OutputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /InputFile:\\path\to\inputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /OutputFile:\\path\to\outputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              [/ShowAllCharSets]" & vbCrLf & vbCrLf 
strSyntaxtext = strSyntaxtext & "Example:" & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:ibm850 /OutputCharset:utf-8 /InputFile:my_dos.txt /OutputFile:my_utf-8.txt" & vbCrLf

Set objArgumentsNamed = WScript.Arguments.Named
If objArgumentsNamed.Count = 0  Then 
   WScript.Echo strSyntaxtext
   WScript.Quit(99)
End If

arrCharsets = Split("big5,big5-hkscs,euc-jp,euc-kr,gb18030,gb2312,gbk,ibm-thai," &_
                    "ibm00858,ibm01140,ibm01141,ibm01142,ibm01143,ibm01144," &_
                    "ibm01145,ibm01146,ibm01147,ibm01148,ibm01149,ibm037," &_
                    "ibm1026,ibm273,ibm277,ibm278,ibm280,ibm284,ibm285,ibm297," &_
                    "ibm420,ibm424,ibm437,ibm500,ibm775,ibm850,ibm852,ibm855," &_
                    "ibm857,ibm860,ibm861,ibm862,ibm863,ibm864,ibm865,ibm866," &_
                    "ibm869,ibm870,ibm871,iso-2022-jp,iso-2022-kr,iso-8859-1," &_
                    "iso-8859-13,iso-8859-15,iso-8859-2,iso-8859-3,iso-8859-4," &_
                    "iso-8859-5,iso-8859-6,iso-8859-7,iso-8859-8,iso-8859-9," &_
                    "koi8-r,koi8-u,shift_jis,tis-620,us-ascii,utf-16,utf-16be," &_
                    "utf-16le,utf-7,utf-8,windows-1250,windows-1251,windows-1252," &_
                    "windows-1253,windows-1254,windows-1255,windows-1256," &_
                    "windows-1257,windows-1258,unicode", ",")

Set objFileSystem = CreateObject("Scripting.FileSystemObject")

For Each objArgumentNamed in objArgumentsNamed
   Select Case Lcase(objArgumentNamed)
      Case "inputcharset"
         strInputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strInputCharset) Then 
            WScript.Echo "The InputCharset (" & strInputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(1)
         End If
      Case "outputcharset"
         strOutputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strOutputCharset) Then 
            WScript.Echo "The strOutputCharset (" & strOutputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(2)
         End If
      Case "inputfile"
         strInputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If Not objFileSystem.FileExists(strInputFile) Then  
            WScript.Echo "The InputFile (" & strInputFile  & ") does not exist, quitting."  & vbCrLf
            WScript.Quit(3)
         End If
      Case "outputfile"
         strOutputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If objFileSystem.FileExists(strOutputFile) Then  
            WScript.Echo "The OutputFile  (" & strOutputFile & ") exists, quitting."  & vbCrLf
            WScript.Quit(4)
         End If
      Case "showallcharsets"
         x = ShowCharsets()
      Case Else
         WScript.Echo "Unknown parameter, quitting: /" & objArgumentNamed & ":" & objArgumentsNamed(objArgumentNamed)
         WScript.Echo strSyntaxtext
   End Select 
Next

If Len(strInputCharset) > 0 And Len(strOutputCharset) > 0 And Len(strInputFile) > 0 And Len(strOutputFile) Then 
   Set objInputStream = CreateObject("ADODB.Stream")
   Set objOutputStream = CreateObject("ADODB.Stream")

   With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      intWritePosition = 0
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
   End With
   objOutputStream.SaveToFile strOutputFile , adSaveCreateNotExist
   objOutputStream.Close
   WScript.Echo "The " & objFileSystem.GetFileName(strInputFile) & " was converted to "  & objFileSystem.GetFileName(strOutputFile) & " OK."
End If
' End Main

' Start Functions 

Function IsCharset(strMyCharset)
IsCharset = False
For Each strCharset in arrCharsets
   If strCharset = strMyCharset Then 
      IsCharset = True
      Exit For
   End If
Next
End Function 

Function ShowCharsets()
strDisplayCharsets = ""
intCounter = 0
For Each strcharset in arrCharsets
   intCounter = intCounter + Len(strcharset) + 1
   strDisplayCharsets = strDisplayCharsets & strcharset & ","
   If intCounter > 67 Then 
      intCounter = 0
      strDisplayCharsets = strDisplayCharsets & vbCrLf 
   End If
Next
strDisplayCharsets = Mid(strDisplayCharsets, 1, Len(strDisplayCharsets)-1)
WScript.Echo strDisplayCharsets 
End Function 
' End Functions 


回答4:

@Bob77's answer did not work for me, so I converted @Ciove's answer to a simple sub routine and it works fine.

' Usage: 
' EncodeFile strInFile, "UTF-8", strOutFile, "Windows-1254", 2
Sub EncodeFile(strInputFile, strInputCharset, strOutputFile, strOutputCharset, intOverwriteMode)

    '5th parameter may take the following values:
    'Const adSaveCreateOverWrite = 2
    'Const adSaveCreateNotExist = 1

    Const adReadLine = -2
    Const adTypeBinary = 1
    Const adTypeText = 2
    Const adWriteLine = 1

    Set objInputStream = CreateObject("ADODB.Stream")
    Set objOutputStream = CreateObject("ADODB.Stream")

    With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
    End With
    objOutputStream.SaveToFile strOutputFile, intOverwriteMode
    objOutputStream.Close
End Sub