可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a question on python:
how can I print matrix nicely with headers like this:
T C G C A
[0 -2 -4 -6 -8 -10]
T [-2 1 -1 -3 -5 -7]
C [-4 -1 2 0 -2 -4]
C [-6 -3 0 1 1 -1]
A [-8 -5 -2 -1 0 2]
I'v triad to print with numpy.matrix(mat)
But all I'v got was:
[[ 0 -2 -4 -6 -8 -10]
[ -2 1 -1 -3 -5 -7]
[ -4 -1 2 0 -2 -4]
[ -6 -3 0 1 1 -1]
[ -8 -5 -2 -1 0 2]]
And I also didn't succeed to add the headers.
Thanks!!!
update
Thank you all.
I'v succeed to install pandas' but I have 2 new problems.
here is my code:
import pandas as pd
col1 = [' ', 'T', 'C', 'G', 'C', 'A']
col2 = [' ', 'T', 'C', 'C', 'A']
df = pd.DataFrame(mat,index = col2, columns = col1)
print df
But I get this error:
df = pd.DataFrame(mat,index = col2, columns = col1)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 163, in __init__
copy=copy)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 224, in _init_ndarray
return BlockManager([block], [columns, index])
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 237, in __init__
self._verify_integrity()
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 313, in _verify_integrity
union_items = _union_block_items(self.blocks)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 906, in _union_block_items
raise Exception('item names overlap')
Exception: item names overlap
And when I am trying to change the letters it works:
T B G C A
0 -2 -4 -6 -8 -10
T -2 1 -1 -3 -5 -7
C -4 -1 2 0 -2 -4
C -6 -3 0 1 1 -1
A -8 -5 -2 -1 0 2
but as you can see the layout of the matrix is not quite well.
How can I fix those problems?
回答1:
Numpy does not provide such a functionality out of the box.
(a) pandas
You may look into pandas. Printing a pandas.DataFrame
usually looks quite nice.
import numpy as np
import pandas as pd
cols = ["T", "C", "S", "W", "Q"]
a = np.random.randint(0,11,size=(5,5))
df = pd.DataFrame(a, columns=cols, index=cols)
print df
will produce
T C S W Q
T 9 5 10 0 0
C 3 8 0 7 2
S 0 2 6 5 8
W 4 4 10 1 5
Q 3 8 7 1 4
(b) pure python
If you only have pure python available, you can use the following function.
import numpy as np
def print_array(a, cols, rows):
if (len(cols) != a.shape[1]) or (len(rows) != a.shape[0]):
print "Shapes do not match"
return
s = a.__repr__()
s = s.split("array(")[1]
s = s.replace(" ", "")
s = s.replace("[[", " [")
s = s.replace("]])", "]")
pos = [i for i, ltr in enumerate(s.splitlines()[0]) if ltr == ","]
pos[-1] = pos[-1]-1
empty = " " * len(s.splitlines()[0])
s = s.replace("],", "]")
s = s.replace(",", "")
lines = []
for i, l in enumerate(s.splitlines()):
lines.append(rows[i] + l)
s ="\n".join(lines)
empty = list(empty)
for i, p in enumerate(pos):
empty[p-i] = cols[i]
s = "".join(empty) + "\n" + s
print s
c = [" ", "T", "C", "G", "C", "A"]
r = [" ", "T", "C", "C", "A" ]
a = np.random.randint(-4,15,size=(5,6))
print_array(a, c, r)
giving you
T C G C A
[ 2 5 -3 7 1 9]
T [-3 10 3 -4 8 3]
C [ 6 11 -2 2 5 1]
C [ 4 6 14 11 10 0]
A [11 -4 -3 -4 14 14]
回答2:
Consider a sample array -
In [334]: arr = np.random.randint(0,25,(5,6))
In [335]: arr
Out[335]:
array([[24, 8, 6, 10, 5, 11],
[11, 5, 19, 6, 10, 5],
[ 6, 2, 0, 12, 6, 17],
[13, 20, 14, 10, 18, 9],
[ 9, 4, 4, 24, 24, 8]])
We can use pandas dataframe, like so -
import pandas as pd
In [336]: print pd.DataFrame(arr,columns=list(' TCGCA'),index=list(' TCCA'))
T C G C A
24 8 6 10 5 11
T 11 5 19 6 10 5
C 6 2 0 12 6 17
C 13 20 14 10 18 9
A 9 4 4 24 24 8
Note that pandas dataframe expects headers(column IDs) and indexes for all rows and columns. So, to skip those for the first row and column, we have used the IDs with the first one being empty : ' TCGCA'
and ' TCCA'
.
回答3:
Here's a quick version of adding labels with plain Python and numpy
Define a function that writes lines. Here is just prints the lines, but it could be set up to print to file, or to collect all the lines in a list and return that.
def pp(arr,lbl):
print(' ',' '.join(lbl))
for i in range(4):
print('%s %s'%(lbl[i], arr[i]))
In [65]: arr=np.arange(16).reshape(4,4)
the default display for a 2d array
In [66]: print(arr)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
In [67]: lbl=list('ABCD')
In [68]: pp(arr,lbl)
A B C D
A [0 1 2 3]
B [4 5 6 7]
C [ 8 9 10 11]
D [12 13 14 15]
Spacing is off because numpy is formatting each line separately, applying a different element width for each row. But it's a start.
It looks better with a random sample:
In [69]: arr = np.random.randint(0,25,(4,4))
In [70]: arr
Out[70]:
array([[24, 12, 12, 6],
[22, 16, 18, 6],
[21, 16, 0, 23],
[ 2, 2, 19, 6]])
In [71]: pp(arr,lbl)
A B C D
A [24 12 12 6]
B [22 16 18 6]
C [21 16 0 23]
D [ 2 2 19 6]