What's the best way to export UTF8 data into E

2020-01-29 05:32发布

So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII encoded text, and accordingly tries to read two-byte chars like ø and ü as two separate characters and you end up with fail.

So I've done a bit of digging (the Intervals folks have a interesting post about it here), and there are some limited if ridiculously annoying options out there. Among them:

  • supplying a UTF-16 Little Endian TSV file which Excel will interpret correctly, but which won't support multi-line data
  • supplying the data in an HTML table with an Excel mime-type or file extension (not sure if this option supports UTF8)
  • there are some three or four ways to get XML data into the various recent versions of excel, and those would support UTF8, in theory. SpreadsheetML, using custom XSLT, or generating the new Excel XML format via templating.

It looks like no matter what, I'm probably going to want to continue offering a plain-old CSV file for the folks who aren't using it for Excel anyway, and a separate download option for Excel.

What's the simplest way of generating that Just-For-Excel file that will correctly support UTF8, my dear Stack Overflowers? If that simplest option only supports the latest version of Excel, that's still of interest.

I'm doing this on a Rails stack, but curious how the .Net-ers and folks on any frameworks handle this. I work in a few different environments myself and this is definitely an issue that will becoming up again.

Update 2010-10-22: We had been using the Ruport gem in our time-tracking system Tempo to provide the CSV exports when I first posted this question. One of my coworkers, Erik Hollensbee, threw together a quick filter for Ruport to provide us with actual Excel XSL output, and I figured I'd share that here for any other ruby-ists:

require 'rubygems'
require 'ruport'
require 'spreadsheet'
require 'stringio'

Spreadsheet.client_encoding = "UTF-8"

include Ruport::Data

class Ruport::Formatter::Excel < Ruport::Formatter
  renders :excel, :for => Ruport::Controller::Table

  def output
    retval = StringIO.new

    if options.workbook
      book = options.workbook
    else
      book = Spreadsheet::Workbook.new
    end

    if options.worksheet_name
      book_args = { :name => options.worksheet_name }
    else
      book_args = { }
    end

    sheet = book.create_worksheet(book_args)

    offset = 0

    if options.show_table_headers
      sheet.row(0).default_format = Spreadsheet::Format.new(
        options.format_options || 
        { 
          :color => :blue,
          :weight => :bold,
          :size => 18
        }
      )
      sheet.row(0).replace data.column_names
      offset = 1
    end

    data.data.each_with_index do |row, i|
      sheet.row(i+offset).replace row.attributes.map { |x| row.data[x] }
    end

    book.write retval
    retval.seek(0)
    return retval.read
  end
end

8条回答
别忘想泡老子
2楼-- · 2020-01-29 06:25

You're forgetting creating an OleDB datasource and Excel Interop, but there are issues with those as well.

I recommend the SpreadsheetML option. It works pretty well, odds are your platform has some decent tools for building xml files, and it's fully supported as far back as OfficeXP. Office2000 is not supported, but personal experience is that it works in a limited way.

查看更多
一纸荒年 Trace。
3楼-- · 2020-01-29 06:25

Try OpenOffice Calc - it's much more Unicode friendly - both Importing and Exporting CSV files with UTF-8 encoding.

查看更多
登录 后发表回答