Excel to CSV with UTF8 encoding

2018-12-31 09:02发布

I have an Excel file that has some Spanish characters (tildes, etc.) that I need to convert to a CSV file to use as an import file. However, when I do Save As CSV it mangles the "special" Spanish characters that aren't ASCII characters. It also seems to do this with the left and right quotes and long dashes that appear to be coming from the original user creating the Excel file in Mac.

Since CSV is just a text file I'm sure it can handle a UTF8 encoding, so I'm guessing it is an Excel limitation, but I'm looking for a way to get from Excel to CSV and keep the non-ASCII characters intact.

30条回答
不再属于我。
2楼-- · 2018-12-31 09:20

Save Dialog > Tools Button > Web Options > Encoding Tab

查看更多
谁念西风独自凉
3楼-- · 2018-12-31 09:23

Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.

enter image description here

查看更多
柔情千种
4楼-- · 2018-12-31 09:24

For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.

Install catdoc:

apt-get install catdoc

Do the conversion:

xls2csv -d utf-8 file.xls > file-utf-8.csv 

This is blazing fast.

Note that it's important that you include the -d utf-8 flag, otherwise it will encode the output in the default cp1252 encoding, and you run the risk of losing information.

Note that xls2csv also only works with .xls files, it does not work with .xlsx files.

查看更多
残风、尘缘若梦
5楼-- · 2018-12-31 09:24

easiest way: no need Open office and google docs

  1. Save your file as "Unicode text file";
  2. now you have an unicode text file
  3. open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
  4. rename file extension from "txt" to "csv"

dont open it with Ms-office anyway!!! Now you have a tab delimited CSV file.

查看更多
浮光初槿花落
6楼-- · 2018-12-31 09:25

I have written a small Python script that can export worksheets in UTF-8.

You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.

#!/usr/bin/env python

# export data sheets from xlsx to csv

from openpyxl import load_workbook
import csv
from os import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def get_all_sheets(excel_file):
    sheets = []
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    all_worksheets = workbook.get_sheet_names()
    for worksheet_name in all_worksheets:
        sheets.append(worksheet_name)
    return sheets

def csv_from_excel(excel_file, sheets):
    workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
    for worksheet_name in sheets:
        print("Export " + worksheet_name + " ...")

        try:
            worksheet = workbook.get_sheet_by_name(worksheet_name)
        except KeyError:
            print("Could not find " + worksheet_name)
            sys.exit(1)

        your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
        wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
        for row in worksheet.iter_rows():
            lrow = []
            for cell in row:
                lrow.append(cell.value)
            wr.writerow(lrow)
        print(" ... done")
    your_csv_file.close()

if not 2 <= len(sys.argv) <= 3:
    print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
    sys.exit(1)
else:
    sheets = []
    if len(sys.argv) == 3:
        sheets = list(sys.argv[2].split(','))
    else:
        sheets = get_all_sheets(sys.argv[1])
    assert(sheets != None and len(sheets) > 0)
    csv_from_excel(sys.argv[1], sheets)
查看更多
永恒的永恒
7楼-- · 2018-12-31 09:28

A simple workaround is to use Google Spreadsheet. Paste (values only if you have complex formulas) or import the sheet then download CSV. I just tried a few characters and it works rather well.

NOTE: Google Sheets does have limitations when importing. See here.

NOTE: Be careful of sensitive data with Google Sheets.

EDIT: Another alternative - basically they use VB macro or addins to force the save as UTF8. I have not tried any of these solutions but they sound reasonable.

查看更多
登录 后发表回答