When running the following code
do line <- getLine
putStrLn line
or,
getLine >>= putStrLn
And, after
getLine >>= putStrLn
entering
µ
one encounters this output:
⠀
Now, I already tried chcp 65001
beforehand, which doesn't work, and the encoding of stdin
is utf8
.
An examination without putStrLn
shows:
getLine
µ
'\NIL'
My environment:
Windows 10 Version 10.0.17134 Build 17134
Lenovo ideapad 510-15IKB
BIOS Version LENOVO 3JCN30WW
GHCi v 8.2.2
How can this be solved?
EDIT: Specifically, the following sequence of actions causes this:
- Open
cmd
- Type
chcp 65001
- Type
ghci
- Type
getLine >>= putStrLn
- Type
µ
However, the following does not:
- Search for
ghci
- Open
ghci.exe
at %PROGRAMS%\Haskell Platform\8.2.2\bin
- Repeat 4-5.
NOTE: %PROGRAMS%
is not a real environment variable.
EDIT: As requested, the output of GHC.IO.Encoding.getLocaleEncoding
:
UTF-8
Also, the output of System.IO.hGetEncoding stdin
:
Just UTF-8
(when using chcp 65001
)
EDIT: The character is U+00B5. I am using a German keyboard, system locale Germany, language setting English, Keyboard language ENG with German layout.
Console input/output is utterly broken on Windows and has been for some time now. Here is the top ticket that tracks all the issues related to IO on Windows:
https://ghc.haskell.org/trac/ghc/ticket/11394
I believe, these two tickets describe best the behavior that you are experiencing:
- https://ghc.haskell.org/trac/ghc/ticket/10542
- https://ghc.haskell.org/trac/ghc/ticket/4471
The only work around right now is to manually use Windows API for dealing console output/input, which is a pain of its own.
EDIT
So, just for the hell of it I decided to endure some of that pain. :)
Here is the output of the code below:
====
Input: µ
Output: µ
====
This is by no means a fully correct or a safe solution, but it does work:
module Main where
import Control.Monad
import System.IO
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.String
import Foreign.C.Types
import Foreign.Storable
import System.Win32
import System.Win32.Types
import Graphics.Win32.Misc
foreign import ccall unsafe "windows.h WriteConsoleW"
c_WriteConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL
foreign import ccall unsafe "windows.h ReadConsoleW"
c_ReadConsoleW :: HANDLE -> LPWSTR -> DWORD -> LPDWORD -> LPVOID -> IO BOOL
-- | Read n characters from a handle, which should be a console stdin
hwGetStrN :: Int -> Handle -> IO String
hwGetStrN maxLen hdl = do
withCWStringLen (Prelude.replicate maxLen '\NUL') $ \(cstr, len) -> do
lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
withHandleToHANDLE hdl $ \winHANDLE ->
withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \lpNumberOfCharsRead -> do
c_ReadConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsRead nullPtr
numWritten <- peek lpNumberOfCharsRead
peekCWStringLen (cstr, fromIntegral numWritten)
-- | Write a string to a handle, which should be a console stdout or stderr.
hwPutStr :: Handle -> String -> IO ()
hwPutStr hdl str = do
void $ withCWStringLen str $ \(cstr, len) -> do
lpNumberOfCharsWrittenForeignPtr <- mallocForeignPtr
withHandleToHANDLE hdl $ \winHANDLE ->
withForeignPtr lpNumberOfCharsWrittenForeignPtr $ \ lpNumberOfCharsWritten ->
c_WriteConsoleW winHANDLE cstr (fromIntegral len) lpNumberOfCharsWritten nullPtr
main :: IO ()
main = do
hwPutStr stdout "====\nInput: "
str <- hwGetStrN 10 stdin
hwPutStr stdout "Output: "
hwPutStr stdout str
hwPutStr stdout "====\n"
EDIT 2
@dfeuer asked me to list things that are unsafe, incorrect or incomplete with that answer. I only really code on Linux, so I am not a Windows programmer, but here are the things that pop into my mind that would need to be changed before that code could be used in a real program:
- The most important part is that code will work only with console handles, which can be determined by
GetConsoleMode
API call.
- For other type of handles the code above will do nothing, eg. if used with pipes or file handles, which has its own issues with encoding, but that is a totally separate issue.
- API call failures aren't accounted for. So we'd have to check if a call was successful by looking at the returned
BOOL
, and whenever it's not use GetLastError
to report the error back to the user.
- Functions implemented above are very limited, there are no checks on how much they've actually read/wrote to/from buffer. For that reason
hwGetStrN
can only handle n
characters, so recursive call would be required in order to get behavior similar to hGetLine
- Do all the sanity checks, eg.
DWORD
is Word32
, so fromIntegral len
call is susceptible to integer overflow, which is both incorrect and unsafe.
- FFI calls must be
stdcall
on 32bit OS, while ccall
for x86_64
, so some CPP is necessary