Parsing Excel Data in Apple Swift

2019-01-22 09:46发布

My current workflow involves using Applescript to essentially delimit Excel data and format it into plain text files. We're pushing towards an all Swift environment, but I haven't yet found any sort of kits for parsing my Excel data into Swift.

The only thing I can think of is to use C or something and wrap it, but that's not ideal. Any better suggestions for parsing this data for use in Swift?

The goal is to eliminate Applescript, but I'm not sure if that will be possible while still interacting with Excel files. Scripting Excel via Applescript seems to be the only method.

EDIT: I don't have the option of eliminating Excel from this workflow. This is how the data will be coming to the application, thus I have to include it.

Being able to streamline the process of parsing this data then processing it will be paramount. I know Applescript has been good in the past with helping me to process it; however, it's getting a little too closed-off for me.

I've been looking at writing something in Swift/Cocoa, but that still might require the data to be extracted with an Applescript, right?

A big plus for pushing Swift is the readability. I don't know Objective-C all that well, and swift would be an easier transition, I feel.

My workflow on PC has been using the COM object, which as has been said, isn't available in the Mac Excel app. I'm only looking for data extraction at this point. Some previous apps did processing within the app, but I'm looking to make this very self-contained, thus all processing within the app I'm developing. Once the data is extracted from the .XLS or .XLSX files, I'll be doing some text editing via RegEx and perhaps a little number crunching. Nothing too crazy. As of now, it will run on the client side, but I'm looking to extend this to a server process.

5条回答
唯我独甜
2楼-- · 2019-01-22 10:21

In Mac OS X 10.6 Snow Leopard Apple introduced the AppleScriptObjC framework which makes it very easy to interact between Cocoa and AppleScript. AppleScript code and a Objective-C like syntax can be used in the same source file. It's much more convenient than Scripting Bridge and NSAppleScript.

AppleScriptObjC cannot be used directly in Swift because the command loadAppleScriptObjectiveCScripts of NSBundle is not bridged to Swift.

However you can use a Objective-C bridge class for example

ASObjC.h

@import Foundation;
@import AppleScriptObjC;

@interface NSObject (Excel)
- (void)openExcelDocument:(NSString *)filePath;
- (NSArray *)valueOfUsedRange;

@end

@interface ASObjC : NSObject

+ (ASObjC *)sharedASObjC;

@property id Excel;

@end

ASObjC.m

#import "ASObjC.h"

@implementation ASObjC

+ (void)initialize
{
    if (self == [ASObjC class]) {
        [[NSBundle mainBundle] loadAppleScriptObjectiveCScripts];
    }
}

+ (ASObjC *)sharedASObjC
{
    static id sharedInstance = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        sharedInstance = [[ASObjC alloc] init];
    });

    return sharedInstance;
}

- (instancetype)init
{
    self = [super init];
    if (self) {
        _Excel = NSClassFromString(@"ASExcel");
    }
    return self;
}

@end

Create a AppleScript source file form the AppleScriptObjC template

ASExcel.applescript

script ASExcel
  property parent: class "NSObject"

  on openExcelDocument:filePath
    set asFilePath to filePath as text
    tell application "Microsoft Excel"
      set sourceBook to open workbook workbook file name asFilePath
      repeat
        try
          get workbooks
          return
        end try
        delay 0.5
      end repeat
    end tell
  end openDocument

  on valueOfUsedRange()
    tell application "Microsoft Excel"
      tell active sheet
        set activeRange to used range
        return value of activeRange
      end tell
    end tell
  end valueOfUsedRange

end script

Link to the AppleScriptObjC framework if necessary.
Create the Bridging Header and import ASObjC.h

Then you can call AppleScriptObjC from Swift with

 ASObjC.sharedASObjC().Excel.openExcelDocument("Macintosh HD:Users:MyUser:Path:To:ExcelFile.xlsx")

or

let excelData = ASObjC.sharedASObjC().Excel.valueOfUsedRange() as! Array<[String]>
查看更多
啃猪蹄的小仙女
3楼-- · 2019-01-22 10:21

There is no need to export Excel files to CSV for Swift as you can use an existing open-source library for parsing XLSX files. If you use CocoaPods or Swift Package Manager for integrating 3rd-party libraries, CoreXLSX supports those. After the library is integrated, you can use it like this:

import CoreXLSX

guard let file = XLSXFile(filepath: "./file.xlsx") else {
  fatalError("XLSX file corrupted or does not exist")
}

for path in try file.parseWorksheetPaths() {
  let ws = try file.parseWorksheet(at: path)
  for row in ws.sheetData.rows {
    for c in row.cells {
      print(c)
    }
  }
}

This will open file.xlsx and print all cells within that file. You can also filter cells by references and access only cell data that you need for your automation.

查看更多
神经病院院长
4楼-- · 2019-01-22 10:24

It's somewhat unclear if you're trying to eliminate Excel as a dependency (which is not unreasonable: it costs money and not everyone has it) or AppleScript as a language (totally understandable, but a bad practical move as Apple's alternatives for application automation all suck).

There are third-party Excel-parsing libraries available for other languages, e.g. I've used Python's openpyxl (for .xlsx files) and xlrd (for .xsl) libraries successfully in my own projects. And I see through the magicks of Googles that someone's written an ObjC framework, DHlibxls, which [assuming no dynamic trickery] should be usable directly from Swift, but I've not used it myself so can't tell you anything more.

查看更多
祖国的老花朵
5楼-- · 2019-01-22 10:34

1. Export to plaintext CSV

If all you're trying to do is extract data from Excel to use elsewhere, as opposed to capturing Excel formulas and formatting, then you probably should not try to read the .xls file. XLS is a complex format. It's good for Excel, not for general data interchange.

Similarly, you probably don't need to use AppleScript or anything else to integrate with Excel, if all you want to do is save the data as plaintext. Excel already knows how to save data as plaintext. Just use Excel's "Save As" command. (That's what it's called on the Mac. I don't know about PCs.)

The question is then what plaintext format to use. One obvious choice for this is a plaintext comma-separated value file (CSV) because it's a simple de facto standard (as opposed to a complex official standard like XML). This will make it easy to consume in Swift, or in any other language.

2. Export in UTF-8 encoding if possible, otherwise as UTF-16

So how do you do that exactly? Plaintext is wonderfully simple, but one subtlety that you need to keep track of is the text encoding. A text encoding is a way of representing characters in a plaintext file. Unfortunately, you cannot reliably tell the encoding of a file just by inspecting the file, so you need to choose an encoding when you save it and remember to use that encoding when you read it. If you mess this up, accented characters, typographer's quotation marks, dashes, and other non-ASCII characters will get mangled. So what text encoding should you use? The short answer is, you should always use UTF-8 if possible.

But if you're working with an older version of Excel, then you may not be able to use UTF-8. In that case, you should use UTF-16. In particular, UTF-16 is, I believe, the only export option in Excel 2011 for Mac which produces a predictable result which will not depend in surprising ways on obscure locale settings or Microsoft-specific encodings.

So if you're on Excel 2011 for Mac, for instance, choose "UTF-16 Unicode Text" from Excel's Save As command.

This will cause Excel to save the file so that every row is a line of text, and every column is separated by a tab character. (So technically, this is a tab-separated value files, rather than a comma-separated value file.)

3. Import with Swift

Now you have a plaintext file, which you know was saved in a UTF-8 (or UTF-16) encoding. So now you can read it and parse it in Swift.

If your Excel data is complicated, you may need a full-featured CSV parser. The best choice is probably CHCSVParser.

Using CHCSV, you can parse the file with the following code:

NSURL * const inputFileURL = [NSURL fileURLWithPath:@"/path/to/exported/file.txt"];
unichar tabCharacter = '\t';
NSArray *rows = [NSArray arrayWithContentsOfCSVFile:inputFilePath options:CHCSVParserOptionsSanitizesFields
                                          delimiter:tabCharacter];

(You could also call it from Swift, of course.)

On the other hand, if you're data is relatively simple (for instance, it has no escaped characters), then you might not need to use an external library at all. You can write some Swift code that parses tab-separated values just by reading in the file as a string, splitting on newlines, and then splitting on tabs.

This function will take a String representing TSV data and return an array of dictionaries:

/**
Reads a multiline, tab-separated String and returns an Array<NSictionary>, taking column names from the first line or an explicit parameter
*/
func JSONObjectFromTSV(tsvInputString:String, columnNames optionalColumnNames:[String]? = nil) -> Array<NSDictionary>
{
  let lines = tsvInputString.componentsSeparatedByString("\n")
  guard lines.isEmpty == false else { return [] }

  let columnNames = optionalColumnNames ?? lines[0].componentsSeparatedByString("\t")
  var lineIndex = (optionalColumnNames != nil) ? 0 : 1
  let columnCount = columnNames.count
  var result = Array<NSDictionary>()

  for line in lines[lineIndex ..< lines.count] {
    let fieldValues = line.componentsSeparatedByString("\t")
    if fieldValues.count != columnCount {
      //      NSLog("WARNING: header has %u columns but line %u has %u columns. Ignoring this line", columnCount, lineIndex,fieldValues.count)
    }
    else
    {
      result.append(NSDictionary(objects: fieldValues, forKeys: columnNames))
    }
    lineIndex = lineIndex + 1
  }
  return result
}

So you only need to read the file into a string and pass it to this function. That snippet comes from this gist for a tsv-to-json converter. And if you need to know more about which text encodings Microsoft products produce, and which ones Cocoa can auto-detect, then this repo on text encoding contains the research on export specimens which led to the conclusion that UTF-16 is the way to go for old Microsoft products on the Mac.

(I realize I'm linking to my own repos here. Apologies?)

查看更多
Melony?
6楼-- · 2019-01-22 10:36

You can use ScriptingBridge or NSAppleScript to interact with Apple Scriptable stuff

ScriptingBridge can generate a header file from the Apple Script dictionary.

NSAppleScript can execute any AppleScript for you by passing a String

查看更多
登录 后发表回答