System.Directory.getDirectoryContents unicode supp

2019-06-26 06:09发布

问题:

The following code prints something like °Ð½Ð´Ð¸Ñ-ÐÑпаниÑ

getDirectoryContents "path/to/directory/that/contains/files/with/nonASCII/names"
  >>= mapM_ putStrLn

Looks like it is a ghc bug and it is fixed already in repository. But what to do until everybody upgrade ghc?

The last time I encountered such the problem (it was few years ago, btw), I used utf8-string package to convert strings, but I don't remember how I did it, and ghc unicode support was changed visibly last years.

So, what is the best (or at least working) way to get directory contents with full unicode support?

ghc version 7.0.4 locale en_US.UTF-8

回答1:

Here's a simple workaround using decodeString and encodeString from utf8-string.

import System.Directory
import qualified Codec.Binary.UTF8.String as UTF8

main = do
   getDirectoryContents "." >>= mapM_ (putStrLn . UTF8.decodeString)
   putStrLn "------------"
   readFile (UTF8.encodeString "brøken-file-nåme.txt") >>= putStrLn

Output:

.
..
brøken-file-nåme.txt
Broken.hs
------------
hello


回答2:

I would recommend looking at system-filepath, which provides an abstract datatype for representing filepaths. I've used it extensively for some internal code and it works wonderfully.