I just wanted to confirm if the default data type for string is unicode
while creating a ndarray
. I could not find any reference which states this clearly. May be it is too obvious and doesn't need stating.
When dtype is specified:
>>> import numpy as np
>>> g = np.array([['a', 'b'],['c', 'd']], dtype='S')
>>> g
array([[b'a', b'b'],
[b'c', b'd']],
dtype='|S1')
Without specifying the dtype:
>>> g = np.array([['a', 'b'],['c', 'd']])
>>> g
array([['a', 'b'],
['c', 'd']],
dtype='<U1')
Also, what does the literal b
indicate when dtype is specified. As per the documentation, it indicates bool
which doesn't seem to be the case here.
Can some one please clarify?
b'...'
means it's a byte-string and the default dtype for arrays of strings depends on the kind of strings. Unicodes (python 3 strings are unicode) areU
and Python 2str
or Python 3bytes
have the dtypeS
. You can find the explanation of dtypes in the NumPy documentation hereHowever in your first case you actually forced NumPy to convert it to bytes because you specified
dtype='S'
.