How to print numpy matrix nicely with text headers

2020-07-17 04:43发布

问题:

I have a question on python:

how can I print matrix nicely with headers like this:

      T  C  G  C  A
  [0 -2 -4 -6 -8 -10]
T [-2  1 -1 -3 -5 -7]
C [-4 -1  2  0 -2 -4]
C [-6 -3  0  1  1 -1]
A [-8 -5 -2 -1  0  2]

I'v triad to print with numpy.matrix(mat) But all I'v got was:

[[  0  -2  -4  -6  -8 -10]
 [ -2   1  -1  -3  -5  -7]
 [ -4  -1   2   0  -2  -4]
 [ -6  -3   0   1   1  -1]
 [ -8  -5  -2  -1   0   2]]

And I also didn't succeed to add the headers.

Thanks!!!

update

Thank you all. I'v succeed to install pandas' but I have 2 new problems. here is my code:

import pandas as pd
col1 = [' ', 'T', 'C', 'G', 'C', 'A']
col2 = [' ', 'T', 'C', 'C', 'A']
df = pd.DataFrame(mat,index = col2, columns = col1)
print df

But I get this error:

    df = pd.DataFrame(mat,index = col2, columns = col1)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 163, in __init__
    copy=copy)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 224, in _init_ndarray
    return BlockManager([block], [columns, index])
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 237, in __init__
    self._verify_integrity()
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 313, in _verify_integrity
    union_items = _union_block_items(self.blocks)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 906, in _union_block_items
    raise Exception('item names overlap')
Exception: item names overlap

And when I am trying to change the letters it works:

       T   B   G   C   A  
   0   -2  -4  -6  -8  -10
T  -2  1   -1  -3  -5  -7 
C  -4  -1  2   0   -2  -4 
C  -6  -3  0   1   1   -1 
A  -8  -5  -2  -1  0   2  

but as you can see the layout of the matrix is not quite well. How can I fix those problems?

回答1:

Numpy does not provide such a functionality out of the box.

(a) pandas

You may look into pandas. Printing a pandas.DataFrame usually looks quite nice.

import numpy as np
import pandas as pd
cols = ["T", "C", "S", "W", "Q"]
a = np.random.randint(0,11,size=(5,5))
df = pd.DataFrame(a, columns=cols, index=cols)
print df

will produce

   T  C   S  W  Q
T  9  5  10  0  0
C  3  8   0  7  2
S  0  2   6  5  8
W  4  4  10  1  5
Q  3  8   7  1  4

(b) pure python

If you only have pure python available, you can use the following function.

import numpy as np

def print_array(a, cols, rows):
    if (len(cols) != a.shape[1]) or (len(rows) != a.shape[0]):
        print "Shapes do not match"
        return
    s = a.__repr__()
    s = s.split("array(")[1]
    s = s.replace("      ", "")
    s = s.replace("[[", " [")
    s = s.replace("]])", "]")
    pos = [i for i, ltr in enumerate(s.splitlines()[0]) if ltr == ","]
    pos[-1] = pos[-1]-1
    empty = " " * len(s.splitlines()[0])
    s = s.replace("],", "]")
    s = s.replace(",", "")
    lines = []
    for i, l in enumerate(s.splitlines()):
        lines.append(rows[i] + l)
    s  ="\n".join(lines)
    empty = list(empty)
    for i, p in enumerate(pos):
        empty[p-i] = cols[i]
    s = "".join(empty) + "\n" + s
    print s



c = [" ", "T", "C", "G", "C", "A"]
r = [" ", "T", "C", "C", "A" ]
a = np.random.randint(-4,15,size=(5,6))    
print_array(a, c, r)

giving you

       T  C  G  C  A      
  [ 2  5 -3  7  1  9]
T [-3 10  3 -4  8  3]
C [ 6 11 -2  2  5  1]
C [ 4  6 14 11 10  0]
A [11 -4 -3 -4 14 14]


回答2:

Consider a sample array -

In [334]: arr = np.random.randint(0,25,(5,6))

In [335]: arr
Out[335]: 
array([[24,  8,  6, 10,  5, 11],
       [11,  5, 19,  6, 10,  5],
       [ 6,  2,  0, 12,  6, 17],
       [13, 20, 14, 10, 18,  9],
       [ 9,  4,  4, 24, 24,  8]])

We can use pandas dataframe, like so -

import pandas as pd

In [336]: print pd.DataFrame(arr,columns=list(' TCGCA'),index=list(' TCCA'))
        T   C   G   C   A
   24   8   6  10   5  11
T  11   5  19   6  10   5
C   6   2   0  12   6  17
C  13  20  14  10  18   9
A   9   4   4  24  24   8

Note that pandas dataframe expects headers(column IDs) and indexes for all rows and columns. So, to skip those for the first row and column, we have used the IDs with the first one being empty : ' TCGCA' and ' TCCA'.



回答3:

Here's a quick version of adding labels with plain Python and numpy

Define a function that writes lines. Here is just prints the lines, but it could be set up to print to file, or to collect all the lines in a list and return that.

def pp(arr,lbl):
    print('  ','  '.join(lbl))
    for i in range(4):
         print('%s %s'%(lbl[i], arr[i]))

In [65]: arr=np.arange(16).reshape(4,4)

the default display for a 2d array

In [66]: print(arr)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

In [67]: lbl=list('ABCD')

In [68]: pp(arr,lbl)
   A  B  C  D
A [0 1 2 3]
B [4 5 6 7]
C [ 8  9 10 11]
D [12 13 14 15]

Spacing is off because numpy is formatting each line separately, applying a different element width for each row. But it's a start.

It looks better with a random sample:

In [69]: arr = np.random.randint(0,25,(4,4))
In [70]: arr
Out[70]: 
array([[24, 12, 12,  6],
       [22, 16, 18,  6],
       [21, 16,  0, 23],
       [ 2,  2, 19,  6]])
In [71]: pp(arr,lbl)
   A  B  C  D
A [24 12 12  6]
B [22 16 18  6]
C [21 16  0 23]
D [ 2  2 19  6]