I always convert excel file into CSV file to import to R as following.
myDataFrame <- read.csv("mydatafile.csv", stringsAsFactors=F)
But, I got a serious problem when I convert xlsx file which is written in Chinese. Most of characters(not all of them) shows '??' because of encoding.
So, I decided to use xlsx
package to import directly. But the problem is that size of excel file exceeds 10MB.
It gave me an error message because of JVMs memory limit. (I assume that xlsx
uses Java internally.)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.lang.OutOfMemoryError: GC overhead limit exceeded
How can I import chinese excel file to R? I tried 'Save as..' CSV file, and opened it notepad, and save it with option 'UTF-8'. but the result was the same(shows '??').
FYI, I can see full chinese character in the original excel file.
Your question is a mixed one. Let's assume that you have converted the xlsx file into csv. If you haven't, please refer to other threads like this one. I think this step is best carried out in some externel tool rather than in R.
Now we've got a csv, there remain two problems, size and encoding. For encoding, as you have mentioned in the comment, you can use the encoding= option of several R functions like read.csv. For Chinese files coming out of Excel, the encoding is most probably "GB18030". If cannot decide, the open file dialog of Libreoffice Calc may give you some clue.
If the file size is large, you may first convert the encoding using the Linux command iconv, and then further process it in R.
Now for the size part. A 50mb or even 500mb csv can easily handled by read.csv, although not necessarily fast, provided that you have enough memory. If the file is larger than 1G, there are two options:
The first one is simpler, the second one can handle really large file.
Hope it helps.