Programmatically extract data from an Excel spread

2020-05-27 11:28发布

Is there a simple way, using some common Unix scripting language (Perl/Python/Ruby) or command line utility, to convert an Excel Spreadsheet file to CSV? Specifically, this one:

http://www.econ.yale.edu/~shiller/data/ie_data.xls

And specifically the third sheet of that spreadsheet (the first two being charts).

10条回答
小情绪 Triste *
2楼-- · 2020-05-27 12:02

Maybe xlrd will do the Job (in Python)

edit: I should really learn to read questions. But writing csv shouldn't be a huge problem so maybe you can actually use it.

查看更多
Deceive 欺骗
3楼-- · 2020-05-27 12:02

You can use pyexcelerator in python.

This code (included in the examples folder of pyexcelerator as xls2csv.py) extracts all sheets from the spreadsheets and outputs them to stdout as CSV.

You can easily change the code to do what you want.

The cool thing about pyexcelerator is that you can also use it to write/create excel xls files, without having excel installed.

#!/usr/bin/env python
# -*- coding: windows-1251 -*-
# Copyright (C) 2005 Kiseliov Roman

__rev_id__ = """$Id: xls2csv.py,v 1.1 2005/05/19 09:27:42 rvk Exp $"""


from pyExcelerator import *
import sys

me, args = sys.argv[0], sys.argv[1:]


if args:
    for arg in args:
        print >>sys.stderr, 'extracting data from', arg
        for sheet_name, values in parse_xls(arg, 'cp1251'): # parse_xls(arg) -- default encoding
            matrix = [[]]
            print 'Sheet = "%s"' % sheet_name.encode('cp866', 'backslashreplace')
            print '----------------'
            for row_idx, col_idx in sorted(values.keys()):
                v = values[(row_idx, col_idx)]
                if isinstance(v, unicode):
                    v = v.encode('cp866', 'backslashreplace')
                else:
                    v = str(v)
                last_row, last_col = len(matrix), len(matrix[-1])
                while last_row < row_idx:
                    matrix.extend([[]])
                    last_row = len(matrix)

                while last_col < col_idx:
                    matrix[-1].extend([''])
                    last_col = len(matrix[-1])

                matrix[-1].extend([v])

            for row in matrix:
                csv_row = ','.join(row)
                print csv_row

else:
    print 'usage: %s (inputfile)+' % me
查看更多
爱情/是我丢掉的垃圾
4楼-- · 2020-05-27 12:03

In Ruby, here is the code I use: (requires the excellent parseexcel gem) require 'parseexcel'

def excelGetSheet(worksheet)
    sheet=Array.new
    worksheet.each { |row|
      if row != nil   # empty row?
        cells=Array.new
        j=0
        row.each { |cell|
          cells << cell.to_s('latin1')  unless cell == nil
          j=j+1
        }
        sheet << cells
      end
    }
    return sheet
end

workbook = Spreadsheet::ParseExcel.parse("MyExcelFile.xls")
sheet1 = excelGetSheet(workbook.worksheet(0))

puts sheet1.inspect
查看更多
混吃等死
5楼-- · 2020-05-27 12:05

Options exist for all three languages. The question is - which one are you most familiar with. This is the language you should use, for sure. And if you're not familiar with either, this application is not really a great example of picking between the languages.

Opinionated P.S: if you don't know any of the languages, just learn Python and use xlrd.

查看更多
登录 后发表回答