VB6 / VBScript中更改文件编码为ANSIVB6 / VBScript中更改文件编码为AN

2019-05-12 05:06发布

我正在寻找一种方法来转换与UTF8编码ANSI编码的文本文件里。

我怎么能去走一走,在Visual Basic(VB6)实现这一点,或VBScript?

Answer 1:

如果你的文件是不是真正巨大的(例如,偶数仅仅40MB可以是非常慢的),你可以使用VB6,VBA,或VBScript下面的代码做到这一点:

Option Explicit

Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0

Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
    Dim strText

    With CreateObject("ADODB.Stream")
        .Open
        .Type = adTypeBinary
        .LoadFromFile UTF8FName
        .Type = adTypeText
        .Charset = "utf-8"
        strText = .ReadText(adReadAll)
        .Position = 0
        .SetEOS
        .Charset = "_autodetect" 'Use current ANSI codepage.
        .WriteText strText, adWriteChar
        .SaveToFile ANSIFName, adSaveCreateOverWrite
        .Close
    End With
End Sub

UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName

请注意,它会处理UTF-8输入文件不管有或没有BOM。

使用强类型和早期绑定将提高性能在VB6头发,你就不会需要申报的常量值。 这不是在脚本中的一个选项,虽然。

对于那些需要处理非常大的文件VB6程序,你可能会更好使用VB6本地I / O对字节数组,并使用一个API调用来将数据转换成块。 这增加了找到字符边界,但(UTF-8使用可变数量的每字符的字节)的额外杂乱。 你需要扫描你阅读找到一个API翻译安全终点的每个数据块。

我想看看的MultiByteToWideChar()和调用WideCharToMultiByte()开始。

需要注意的是UTF-8往往“到达”与LF行分隔符,而不是CRLF。



Answer 2:

我使用这些辅助功能

Private Function pvReadFile(sFile)
    Const ForReading = 1
    Dim sPrefix

    With CreateObject("Scripting.FileSystemObject")
        sPrefix = .OpenTextFile(sFile, ForReading, False, False).Read(3)
    End With
    If Left(sPrefix, 3) <> Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
        With CreateObject("Scripting.FileSystemObject")
            pvReadFile = .OpenTextFile(sFile, ForReading, False, Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE)).ReadAll()
        End With
    Else
        With CreateObject("ADODB.Stream")
            .Open
            If Left(sPrefix, 2) = Chr(&HFF) & Chr(&HFE) Then
                .Charset = "Unicode"
            ElseIf Left(sPrefix, 3) = Chr(&HEF) & Chr(&HBB) & Chr(&HBF) Then
                .Charset = "UTF-8"
            Else
                .Charset = "_autodetect"
            End If
            .LoadFromFile sFile
            pvReadFile = .ReadText
        End With
    End If
End Function

Private Function pvWriteFile(sFile, sText, lType)
    Const adSaveCreateOverWrite = 2

    With CreateObject("ADODB.Stream")
        .Open
        If lType = 2 Then
            .Charset = "Unicode"
        ElseIf lType = 3 Then
            .Charset = "UTF-8"
        Else
            .Charset = "_autodetect"
        End If
        .WriteText sText
        .SaveToFile sFile, adSaveCreateOverWrite
    End With
End Function

我发现ANSI的是“原生” FileSystemObject的阅读和UTF-16 / UCS-2文件更快的是ADODB.Stream黑客。



Answer 3:

我使用这个脚本转换任何字符集或代码页(即我所知道的)。

该脚本可以处理大型文件 (超过一个千兆字节),因为它流在一次一行。

' - ConvertCharset.vbs -
'
' Inspired by: 
' http://www.vbforums.com/showthread.php?533879-Generate-text-files-in-IBM-850-encoding
' http://stackoverflow.com/questions/5182102/vb6-vbscript-change-file-encoding-to-ansii/5186170#5186170
' http://stackoverflow.com/questions/13130214/how-to-convert-a-batch-file-stored-in-utf-8-to-something-that-works-via-another
' 
' Start Main
Dim objArguments
Dim strSyntaxtext, strInputCharset, strOutputCharset, strInputFile, strOutputFile 
Dim intReadPosition, intWritePosition
Dim arrCharsets

Const adReadAll = -1
Const adReadLine = -2
Const adSaveCreateOverWrite = 2
Const adSaveCreateNotExist = 1
Const adTypeBinary = 1
Const adTypeText = 2
Const adWriteChar = 0
Const adWriteLine = 1

strSyntaxtext = strSyntaxtext & "Converts the charset of the input text file to output file." & vbCrLf
strSyntaxtext = strSyntaxtext & "Syntax: "  & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf
strSyntaxtext = strSyntaxtext & "              /OutputCharset:utf-8|windows-1252|ibm850|..." & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /InputFile:\\path\to\inputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              /OutputFile:\\path\to\outputfile.ext" & vbCrLf 
strSyntaxtext = strSyntaxtext & "              [/ShowAllCharSets]" & vbCrLf & vbCrLf 
strSyntaxtext = strSyntaxtext & "Example:" & vbCrLf
strSyntaxtext = strSyntaxtext & WScript.ScriptName & " /InputCharset:ibm850 /OutputCharset:utf-8 /InputFile:my_dos.txt /OutputFile:my_utf-8.txt" & vbCrLf

Set objArgumentsNamed = WScript.Arguments.Named
If objArgumentsNamed.Count = 0  Then 
   WScript.Echo strSyntaxtext
   WScript.Quit(99)
End If

arrCharsets = Split("big5,big5-hkscs,euc-jp,euc-kr,gb18030,gb2312,gbk,ibm-thai," &_
                    "ibm00858,ibm01140,ibm01141,ibm01142,ibm01143,ibm01144," &_
                    "ibm01145,ibm01146,ibm01147,ibm01148,ibm01149,ibm037," &_
                    "ibm1026,ibm273,ibm277,ibm278,ibm280,ibm284,ibm285,ibm297," &_
                    "ibm420,ibm424,ibm437,ibm500,ibm775,ibm850,ibm852,ibm855," &_
                    "ibm857,ibm860,ibm861,ibm862,ibm863,ibm864,ibm865,ibm866," &_
                    "ibm869,ibm870,ibm871,iso-2022-jp,iso-2022-kr,iso-8859-1," &_
                    "iso-8859-13,iso-8859-15,iso-8859-2,iso-8859-3,iso-8859-4," &_
                    "iso-8859-5,iso-8859-6,iso-8859-7,iso-8859-8,iso-8859-9," &_
                    "koi8-r,koi8-u,shift_jis,tis-620,us-ascii,utf-16,utf-16be," &_
                    "utf-16le,utf-7,utf-8,windows-1250,windows-1251,windows-1252," &_
                    "windows-1253,windows-1254,windows-1255,windows-1256," &_
                    "windows-1257,windows-1258,unicode", ",")

Set objFileSystem = CreateObject("Scripting.FileSystemObject")

For Each objArgumentNamed in objArgumentsNamed
   Select Case Lcase(objArgumentNamed)
      Case "inputcharset"
         strInputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strInputCharset) Then 
            WScript.Echo "The InputCharset (" & strInputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(1)
         End If
      Case "outputcharset"
         strOutputCharset = LCase(objArgumentsNamed(objArgumentNamed))
         If Not IsCharset(strOutputCharset) Then 
            WScript.Echo "The strOutputCharset (" & strOutputCharset & ") is not valid, quitting. The valid charsets are:"  & vbCrLf
            x = ShowCharsets()
            WScript.Quit(2)
         End If
      Case "inputfile"
         strInputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If Not objFileSystem.FileExists(strInputFile) Then  
            WScript.Echo "The InputFile (" & strInputFile  & ") does not exist, quitting."  & vbCrLf
            WScript.Quit(3)
         End If
      Case "outputfile"
         strOutputFile = LCase(objArgumentsNamed(objArgumentNamed))
         If objFileSystem.FileExists(strOutputFile) Then  
            WScript.Echo "The OutputFile  (" & strOutputFile & ") exists, quitting."  & vbCrLf
            WScript.Quit(4)
         End If
      Case "showallcharsets"
         x = ShowCharsets()
      Case Else
         WScript.Echo "Unknown parameter, quitting: /" & objArgumentNamed & ":" & objArgumentsNamed(objArgumentNamed)
         WScript.Echo strSyntaxtext
   End Select 
Next

If Len(strInputCharset) > 0 And Len(strOutputCharset) > 0 And Len(strInputFile) > 0 And Len(strOutputFile) Then 
   Set objInputStream = CreateObject("ADODB.Stream")
   Set objOutputStream = CreateObject("ADODB.Stream")

   With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      intWritePosition = 0
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
   End With
   objOutputStream.SaveToFile strOutputFile , adSaveCreateNotExist
   objOutputStream.Close
   WScript.Echo "The " & objFileSystem.GetFileName(strInputFile) & " was converted to "  & objFileSystem.GetFileName(strOutputFile) & " OK."
End If
' End Main

' Start Functions 

Function IsCharset(strMyCharset)
IsCharset = False
For Each strCharset in arrCharsets
   If strCharset = strMyCharset Then 
      IsCharset = True
      Exit For
   End If
Next
End Function 

Function ShowCharsets()
strDisplayCharsets = ""
intCounter = 0
For Each strcharset in arrCharsets
   intCounter = intCounter + Len(strcharset) + 1
   strDisplayCharsets = strDisplayCharsets & strcharset & ","
   If intCounter > 67 Then 
      intCounter = 0
      strDisplayCharsets = strDisplayCharsets & vbCrLf 
   End If
Next
strDisplayCharsets = Mid(strDisplayCharsets, 1, Len(strDisplayCharsets)-1)
WScript.Echo strDisplayCharsets 
End Function 
' End Functions 


Answer 4:

@ Bob77的回答并没有为我工作,所以我转换@ Ciove的回答一个简单的子程序,它工作正常。

' Usage: 
' EncodeFile strInFile, "UTF-8", strOutFile, "Windows-1254", 2
Sub EncodeFile(strInputFile, strInputCharset, strOutputFile, strOutputCharset, intOverwriteMode)

    '5th parameter may take the following values:
    'Const adSaveCreateOverWrite = 2
    'Const adSaveCreateNotExist = 1

    Const adReadLine = -2
    Const adTypeBinary = 1
    Const adTypeText = 2
    Const adWriteLine = 1

    Set objInputStream = CreateObject("ADODB.Stream")
    Set objOutputStream = CreateObject("ADODB.Stream")

    With objInputStream
      .Open
      .Type = adTypeBinary
      .LoadFromFile strInputFile
      .Type = adTypeText
      .Charset = strInputCharset
      objOutputStream.Open
      objOutputStream.Charset = strOutputCharset
      Do While .EOS <> True
         strText = .ReadText(adReadLine)
         objOutputStream.WriteText strText, adWriteLine
      Loop
      .Close
    End With
    objOutputStream.SaveToFile strOutputFile, intOverwriteMode
    objOutputStream.Close
End Sub


文章来源: VB6/VBScript change file encoding to ansi