Moving data from MS Word to MS Excel

2019-09-01 03:48发布

问题:

I have transcripts of data in MS Word want to read into a stats program called R. The problem is these documents contain special characters (not plain text). My process for dealing with them has been to sub them out in MS Word/save as a txt document/read into MS Excel (makes a column for people and dialogue using the import wizard)/Convert to .csv/read into R. This process works but is time consuming. I found out how to read the text with special characters right into R (R generally wants plain text) but this requires the document be in an excel document. This is desirable because if I can read the special characters into R it's rather simple to sub out all the special characters at once. The problem arises because I can't get the MS Word document into Excel directly. I have to save it as a text file first (which I don't mind doing) and then read it in. This turns the special characters into boxes and question marks. I need to get the MS Word doc into Excel as a data frame with 2 columns (person, dialogue) without destroying the special characters (“, ”, —, ’, ‘, …, etc.).

I can do this by subbing out in Word with replace but again if I could get it to Excel doing this in R would be much easier.

Here is a sample MS Word doc of what my data looks like (tab separated columns)

https://dl.dropbox.com/u/61803503/TEST.doc

Excel and Word versions 2010 on a Win 7 machine.

回答1:

One way: use Edit->Copy in Word and Edit->Paste in Excel. A simple tabular structure should be preserved if you do that, with preservation of Unicode characters. Not so sure about non-Unicode stuff such as Wingdings. Haven't tried VBA-ing that, either.