Import text with foreign characters

2020-07-30 03:32发布

I have code which imports many text documents, containing foreign/special characters, into an Excel workbook:

Sub loadfiles()

    Dim fpath As String
    Dim fname As String
    Application.ScreenUpdating = False

    fpath = "...\data\"
    fname = Dir(fpath & "*.txt")
    For i = 1 To 10
    Application.StatusBar = True
    Application.StatusBar = "Progress: " & i & " of 10000"
        Sheet1.Select
        Range("A" & i).Value = fname
        With ActiveSheet.QueryTables.Add(Connection:="TEXT;" _
          & fpath & fname, Destination:=Range("B" & i))
            .Name = "a"
            .FieldNames = True
            .RowNumbers = False
            .FillAdjacentFormulas = False
            .PreserveFormatting = False
            .RefreshOnFileOpen = False
            .RefreshStyle = xlInsertDeleteCells
            .SaveData = True
            .AdjustColumnWidth = False
            .RefreshPeriod = 0
            .TextFilePromptOnRefresh = False
            .TextFilePlatform = 437
            .TextFileStartRow = 1
            .TextFileParseType = xlDelimited
            .TextFileTextQualifier = xlTextQualifierDoubleQuote
            .TextFileConsecutiveDelimiter = False
            .TextFileTabDelimiter = False
            .TextFileColumnDataTypes = _
             Array(xlTextFormat, xlSkipColumn, xlGeneralFormat)
            .Refresh BackgroundQuery:=False
            fname = Dir
        End With
    Next i
    Application.StatusBar = False
    Application.ScreenUpdating = True
    MsgBox "Done"
End Sub

Is there any way to import text without losing original characters?

标签: vba excel
2条回答
Bombasti
2楼-- · 2020-07-30 04:31

Try adding

.QueryType = xlTextImport

and changing

.TextFilePlatform = xlMSDOS

.PreserveFormatting = True

查看更多
孤傲高冷的网名
3楼-- · 2020-07-30 04:37

Instead of VBA a quick approach could be as follows.

1.Save the imported file as a csv

2.Open Excel

3.Import the data using Data-->Import External Data --> Import Data

4.Select the file type of "csv" and browse to your file

5.In the import wizard change the File_Origin to Select "Japanese shift-jis encoding" (or choose correct language character identifier)

6.Change the Delimiter to comma

7.Select where to import to and Finish.

This way the special characters should show correctly.

I uderstand adding \uFEFF at the beginning of any CSV file (generated in Java), Excel is able to open them correctly. The UTF-8 Byte-order marker will clue Excel 2007+ in to the fact that you're using UTF-8 UTF-8 is a variable width encoding. It only requires 1 byte to encode ASCII characters, but other code points will use multiple bytes.

A correctly formatted UTF8 file can have a Byte Order Mark as its first three >octets. These are the hex values 0xEF, 0xBB, 0xBF. These octets serve to mark >the file as UTF8 (since they are not relevant as "byte order" information).1 If >this BOM does not exist, the consumer/reader is left to infer the encoding type >of the text. Readers that are not UTF8 capable will read the bytes as some >other encoding such as Windows-1252 and display the characters  at the start >of the file.

There is a known bug where Excel, upon opening a UTF8 csv files via file >association, assumes that they are in a single-byte encoding, disregarding the >presence of the UTF8 BOM. This can not be fixed by any system default codepage >or language setting. The BOM will not clue in Excel - it just won't work. (A >minority report claims that the BOM sometimes triggers the "Import Text" >wizard.) This bug appears to exist in Excel 2003 and earlier. Most reports Note >that you can always* correctly open UTF8 cdv files in Excel using the "Import >Text" wizard, which allows you to specify the encoding of the file you're >opening. Of course this is much less convenient.

Readers of this answer are most likely in a situation where they don't >particularly support Excel < 2007, but are sending raw UTF8 text to Excel, >which is misinterpreting it and sprinkling your text with à and other similar >Windows-1252 characters. Adding the UTF8 BOM is probably your best and quickest >fix.(amidst the answers here) say that this is fixed in Excel 2007 and newer.

Microsoft Excel mangles Diacritics in .csv files?

查看更多
登录 后发表回答