How to save worksheets as CSV files in UTF-8 forma

2019-04-15 09:35发布

问题:

I have the following Excel VBA script which I use in Excel for Mac 2011 to export all the worksheets of an Excel file to .csv files.

Sub save_all_csv()
    On Error Resume Next
    Dim ExcelFileName As String
    ExcelFileName = ThisWorkbook.Name
    For Each objWorksheet In ThisWorkbook.Worksheets
        ExcelFileNameWithoutExtension = RemoveExtension(ExcelFileName)
        CsvFileName = ExcelFileNameWithoutExtension & "__" & objWorksheet.Name & ".csv"
        Application.DisplayAlerts = False
        objWorksheet.SaveAs Filename:="data:storage:original_excel:" & CsvFileName, FileFormat:=xlCSV, CreateBackup:=False
        Application.DisplayAlerts = True
    Next
    Application.DisplayAlerts = False
    Application.Quit
End Sub
Function RemoveExtension(strFileName As String)
    strFileName = Replace(strFileName, ".xlsx", "")
    strFileName = Replace(strFileName, ".xls", "")
    RemoveExtension = strFileName
End Function

The problem is that they save as Western Mac OS Roman which I can't get to convert to UTF-8 via PHP, so I would like to have VBA save these files in UTF-8 format in the first place.

I've found some solutions for saving text from VBA via a Stream object and with CreateObject but it's apparently not possibile on the Mac to use CreateObject to write directly to files.

How can I save worksheets as CSV files in UTF-8 format with Excel for Mac 2011?

回答1:

I had the same problem, eventually just wrote a routine to encode to utf-8. This code works on my MacBook Pro in Mac:Excel 2011. Just use the byte array created by this function to do binary file output. In my case I am processing Thai script, in order to automate use of Mac's excellent voice synth.

Private Function UTF8Encode(b() As Byte) As Byte()
    ' Function to convert a Unicode Byte array into a byte array that can be written to create a UTF8 Encoded file.
    ' Note the function supports the one, two and three byte UTF8 forms.
    ' Note: the MS VBA documentation is confusing. It says the String types only supports single byte charset
    '           however, thankfully, it does in fact contain 2 byte Unicode values.
    ' Wrote this routine as last resort, tried many ways to get unicode chars to a file or to a shell script call
    ' but this was the only way could get to work.
    ' RT Perkin
    ' 30/10/2015

    Dim b1, b2, b3 As Byte            ' UTF8 encoded bytes
    Dim u1, u2 As Byte                  ' Unicode input bytes
    Dim out As New Collection      ' Collection to build output array
    Dim i, j As Integer
    Dim unicode As Long

    If UBound(b) <= 0 Then
        Exit Function
    End If

    For i = 0 To UBound(b) Step 2
        u1 = b(i)
        u2 = b(i + 1)
        unicode = u2 * 256 + u1

        If unicode < &H80 Then
            ' Boils down to ASCII, one byte UTF-8
            out.Add (u1)
        ElseIf unicode < &H800 Then
            ' Two byte UTF-8
            ' Code path not tested
            b1 = &H80 Or (&H3F And u1)
            b2 = &HC0 Or (Int(u1 / 64)) Or ((&H7 And u2) * 4)
            out.Add (b2) ' Add most significant byte first
            out.Add (b1)
        ElseIf unicode < &H10000 Then
            ' Three byte UTF-8
            ' Thai chars are in this range
            b1 = &H80 Or (&H3F And u1)
            b2 = &H80 Or (Int(u1 / 64)) Or ((&HF And u2) * 4)
            b3 = &HE0 Or (Int(u2 / 16))
            out.Add (b3) ' Add most significant byte first
            out.Add (b2)
            out.Add (b1)
        Else
            ' This case wont arise as VBA strings are 2 byte. Which makes some Unicode codepoints uncodeable.
        End If

    Next

    Dim outBytes() As Byte
    ReDim outBytes(1 To out.Count)
    For j = 1 To out.Count
        outBytes(j) = CByte(out.Item(j))
    Next

    UTF8Encode = outBytes

End Function


回答2:

I think you're right that you'll have to do the file creation yourself, and I'm quite sure you can't use CreateObject to make streams. However, you may want to look into using Open, Write etc (lots of examples around, but here's one). I'd make a very small experimental one first, just to check what encoding that ends up with, though.

Chris