I'm developing a part of an application that's responsible for exporting some data into CSV files. The application always uses UTF-8 because of its multilingual nature at all levels. But opening such CSV files (containing e.g. diacritics, cyrillic letters, Greek letters) in Excel does not achieve the expected results showing something like Г„/Г¤, Г–/Г¶
. And I don't know how to force Excel understand that the open CSV file is encoded in UTF-8. I also tried specifying UTF-8 BOM EF BB BF
, but Excel ignores that.
Is there any workaround?
P.S. Which tools may potentially behave like Excel does?
UPDATE
I have to say that I've confused the community with the formulation of the question. When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user, in a fluent and transparent way. However, I used a wrong formulation asking for doing it automatically. That is very confusing and it clashes with VBA macro automation. There are two answers for this questions that I appreciate the most: the very first answer by Alex https://stackoverflow.com/a/6002338/166589, and I've accepted this answer; and the second one by Mark https://stackoverflow.com/a/6488070/166589 that have appeared a little later. From the usability point of view, Excel seemed to have lack of a good user-friendly UTF-8 CSV support, so I consider both answers are correct, and I have accepted Alex's answer first because it really stated that Excel was not able to do that transparently. That is what I confused with automatically here. Mark's answer promotes a more complicated way for more advanced users to achieve the expected result. Both answers are great, but Alex's one fits my not clearly specified question a little better.
UPDATE 2
Five months later after the last edit, I've noticed that Alex's answer has disappeared for some reason. I really hope it wasn't a technical issue and I hope there is no more discussion on which answer is greater now. So I'm accepting Mark's answer as the best one.
This is my working solution:
The key is Origin:=65001
As I posted on http://thinkinginsoftware.blogspot.com/2017/12/correctly-generate-csv-that-excel-can.html:
Tell the software developer in charge of generating the CSV to correct it. As a quick workaround you can use gsed to insert the UTF-8 BOM at the beginning of the string:
This command inserts the UTF-4 BOM if not present. Therefore it is an idempotent command. Now you should be able to double click the file and open it in Excel.
It is incredible that there are so many answers but none answers the question:
The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.
So the question is how to make Excel show UTF8 data with a simple double click?
Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:
I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:
Content of UTF8 CSV file:
Result in Excel 2007:
A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.
Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.
However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.
Content of UTF8 XLS file:
Result in Excel 2007:
You can even use colors in HTML which Excel will show correctly.
Result in Excel 2007:
In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:
This code even allows to specify the name of the worksheet (here "MySuperSheet")
Result in Excel 2007:
Yes, this is possible. As previously noted by multiple users, there seems to be a problem with excel reading the correct Byte Order Mark when the file is encoded in UTF-8. With UTF-16 it does not seem to have a problem, so it is endemic to UTF-8. The solution I use for this is adding the BOM, TWICE. For this I execute the following sed command twice:
, where the wildcard can be replaced with any file name. However, this leads to a mutation of the sep= at the beginning of the .csv file. The .csv file will then open normally in excel, but with an extra row with "sep=" in the first cell. The "sep=" can also be removed in the source .csv itself, but when opening the file with VBA the delimiter should be specified:
Format 6 is the .csv format. Set Local to true, in case there are dates in the file. If Local is not set to true the dates will be Americanized, which in some cases will corrupt the .csv format.
I tried everything I could find on this thread and similar, nothing worked fully. However, importing to google sheets and simply downloading as csv worked like a charm. Try it out if you come to my frustration point.
Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:
This way the special characters should show correctly.