I need to decode powershell stdout called from python into a python string.
My ultimate goal is to get in a form of a list of strings the names of network adapters in Windows. My current function looks like this and works well on Windows 10 with English language:
def get_interfaces():
ps = subprocess.Popen(['powershell', 'Get-NetAdapter', '|', 'select Name', '|', 'fl'], stdout = subprocess.PIPE)
stdout, stdin = ps.communicate(timeout = 10)
interfaces = []
for i in stdout.split(b'\r\n'):
if not i.strip():
continue
if i.find(b':')<0:
continue
name, value = [ j.strip() for j in i.split(b':') ]
if name == b'Name':
interfaces.append(value.decode('ascii')) # This fails for other users
return interfaces
Other users have different languages, so value.decode('ascii')
fails for some of them. E.g. one user reported that changing to decode('ISO 8859-2')
works well for him (so it is not UTF-8). How can I know encoding to decode the stdout bytes returned by call to powershell?
UPDATE
After some experiments I am even more confused. Codepage in my console as returned by chcp
is 437. I changed network adapter name to a name containing non-ascii and non-cp437 characters. In interactive powershell running Get-NetAdapter | select Name | fl
displayed correctly the name even its non-cp437 character. When I called powershell from python non-ascii characters were converted to closest ascii characters (e.g. ā to a, ž to z) and .decode(ascii)
worked nicely. Could this behaviour (and correspondingly solution) be Windows version dependent? I am on Windows 10, but users could be on older Windows down to Windows 7.
The output character encoding may depend on specific commands e.g.:
Output
✌ (U+270C) character is received successfully.
The character encoding of the child script is set using
PYTHONIOENCODING
envvar inside the PowerShell session. I've chosenutf-32
for the output encoding so that it would be different from Windows ANSI and OEM code pages for the demonstration.Notice that the stdout encoding of the parent Python script is OEM code page (
cp437
in this case) -- the script is run from the Windows console. If you redirect the output of the parent Python script to a file/pipe then ANSI code page (e.g.,cp1252
) is used by default in Python 3.To decode powershell output that might contain characters undecodable in the current OEM code page, you could set
[Console]::OutputEncoding
temporarily (inspired by @eryksun's comments):Output
Both
fl
andtee
use[Console]::OutputEncoding
for stdout (the default behavior is as if| Write-Output
is appended to the pipelines).tee
uses utf-16, to save a text to a file. The output shows that ✌ (U+270C) is decoded successfully.$OutputEncoding
is used to decode bytes in the middle of a pipeline:Output
that is correct:
b'\xf0\x9f\x98\x8a'.decode('utf-8') == u'\U0001f60a'
. With the default$OutputEncoding
(ascii) we would getb'????\r\n'
instead.Note:
b'\n'
is replaced withb'\r\n'
despite using binary API such asos.read/os.write
(msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
has no effect here)b'\r\n'
is appended if there is no newline in the output:Output:
The newline is appended to the piped output.
If we ignore lone surrogates then setting
UTF8Encoding
allows to pass via pipes all Unicode characters including non-BMP characters. Text mode could be used in Python if$env:PYTHONIOENCODING = "utf-8:ignore"
is configured.If stdout is not redirected then Unicode API is used, to print characters to the console -- any [BMP] Unicode character can be displayed if the console (TrueType) font supports it.
It might be due to
System.Text.InternalDecoderBestFitFallback
set for[Console]::OutputEncoding
-- if a Unicode character can't be encoded in a given encoding then it is passed to the fallback (either a best fit char or'?'
is used instead of the original character).If we ignore bugs in cp65001 and a list of new encodings that are supported in later versions then the behavior should be the same.
It's a Python 2 bug already marked as wontfix: https://bugs.python.org/issue19264
I must use Python 3 if you want to make it work under Windows.