How do you get getLine to accept unicode character

2019-05-14 08:38发布

When running the following code

do line <- getLine
   putStrLn line

or,

getLine >>= putStrLn

And, after

 getLine >>= putStrLn

entering

µ

one encounters this output:

Now, I already tried chcp 65001 beforehand, which doesn't work, and the encoding of stdin is utf8.

An examination without putStrLn shows:

 getLine
µ
'\NIL'

My environment:
Windows 10 Version 10.0.17134 Build 17134
Lenovo ideapad 510-15IKB
BIOS Version LENOVO 3JCN30WW
GHCi v 8.2.2

How can this be solved?

EDIT: Specifically, the following sequence of actions causes this:

  1. Open cmd
  2. Type chcp 65001
  3. Type ghci
  4. Type getLine >>= putStrLn
  5. Type µ

However, the following does not:

  1. Search for ghci
  2. Open ghci.exe at %PROGRAMS%\Haskell Platform\8.2.2\bin
  3. Repeat 4-5.

NOTE: %PROGRAMS% is not a real environment variable.

EDIT: As requested, the output of GHC.IO.Encoding.getLocaleEncoding:

UTF-8

Also, the output of System.IO.hGetEncoding stdin:

Just UTF-8

(when using chcp 65001)

EDIT: The character is U+00B5. I am using a German keyboard, system locale Germany, language setting English, Keyboard language ENG with German layout.

1条回答
SAY GOODBYE
2楼-- · 2019-05-14 08:50

Console input/output is utterly broken on Windows and has been for some time now. Here is the top ticket that tracks all the issues related to IO on Windows: https://ghc.haskell.org/trac/ghc/ticket/11394

I believe, these two tickets describe best the behavior that you are experiencing:

The only work around right now is to manually use Windows API for dealing console output/input, which is a pain of its own.

EDIT

So, just for the hell of it I decided to endure some of that pain. :)

Here is the output of the code below:

====
Input: µ
Output: µ
====

This is by no means a fully correct or a safe solution, but it does work:

module Main where

import Control.Monad
import System.IO
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.String
import Foreign.C.Types
import Foreign.Storable

import System.Win32
import System.Win32.Types
import Graphics.Win32.Misc

foreign import ccall unsafe "windows.h WriteConsoleW"
  c_WriteConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL

foreign import ccall unsafe "windows.h ReadConsoleW"
  c_ReadConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL

-- | Read n characters from a handle, which should be a console stdin
hwGetStrN :: Int -> Handle -> IO String
hwGetStrN maxLen hdl = do
  withCWStringLen (Prelude.replicate maxLen '\NUL') $ \(cstr, len) -> do
    lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
    withHandleToHANDLE hdl $ \winHANDLE ->
      withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \lpNumberOfCharsRead -> do
        c_ReadConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsRead nullPtr
        numWritten <- peek lpNumberOfCharsRead
        peekCWStringLen (cstr, fromIntegral numWritten)

-- | Write a string to a handle, which should be a console stdout or stderr.
hwPutStr :: Handle -> String -> IO ()
hwPutStr hdl str = do
  void $ withCWStringLen str $ \(cstr, len) -> do
    lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
    withHandleToHANDLE hdl $ \winHANDLE ->
      withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \ lpNumberOfCharsWritten ->
      c_WriteConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsWritten nullPtr

main :: IO ()
main = do
  hwPutStr stdout "====\nInput: "
  str <- hwGetStrN 10 stdin
  hwPutStr stdout "Output: "
  hwPutStr stdout str
  hwPutStr stdout "====\n"

EDIT 2

@dfeuer asked me to list things that are unsafe, incorrect or incomplete with that answer. I only really code on Linux, so I am not a Windows programmer, but here are the things that pop into my mind that would need to be changed before that code could be used in a real program:

  • The most important part is that code will work only with console handles, which can be determined by GetConsoleMode API call.
  • For other type of handles the code above will do nothing, eg. if used with pipes or file handles, which has its own issues with encoding, but that is a totally separate issue.
  • API call failures aren't accounted for. So we'd have to check if a call was successful by looking at the returned BOOL, and whenever it's not use GetLastError to report the error back to the user.
  • Functions implemented above are very limited, there are no checks on how much they've actually read/wrote to/from buffer. For that reason hwGetStrN can only handle n characters, so recursive call would be required in order to get behavior similar to hGetLine
  • Do all the sanity checks, eg. DWORD is Word32, so fromIntegral len call is susceptible to integer overflow, which is both incorrect and unsafe.
  • FFI calls must be stdcall on 32bit OS, while ccall for x86_64, so some CPP is necessary
查看更多
登录 后发表回答