Python numpy: Convert string in to numpy array

2020-03-22 07:02发布

问题:

I have following String that I have put together:

v1fColor = '2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,0,0,0,3,6,0,0,0,0,0,0,0,0,0,7,5,0,0,0,2,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,1,1,0,0,0,0,0,0,0,0,0,6,6,0,0,0,2,3'

I am treating it as a vector: Long story short its a forecolor of an image histogram:

I have the following lambda function to calculate cosine similarity of two images, So I tried to convert this is to numpy.array but I failed:

Here is my lambda function

import numpy as NP
import numpy.linalg as LA
cx = lambda a, b : round(NP.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)

So I tried the following to convert this string as a numpy array:

v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)

But I ended up getting following error:

    v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)
ValueError: invalid literal for float(): 2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,

回答1:

You have to split the string by its commas first:

NP.array(v1fColor.split(","), dtype=NP.uint8)


回答2:

You can do this without using python string methods -- try numpy.fromstring:

>>> numpy.fromstring(v1fColor, dtype='uint8', sep=',')
array([ 2,  4, 14,  5,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 12,  4,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  6,  0,  0,
        0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0, 20,  9,  0,  0,  0,
        2,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0, 13,  6,  0,  0,  0,  1,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  8,  0,  0,  0,  1,  2,
        0,  0,  0,  0,  0,  0,  0,  0,  0, 17, 17,  0,  0,  0,  3,  6,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  7,  5,  0,  0,  0,  2,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  4,  3,  0,  0,  0,  1,  1,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  6,  6,  0,  0,  0,  2,  3], dtype=uint8)


回答3:

You can do this:

lst = v1fColor.split(',')  #create a list of strings, splitting on the commas.
v1fColor = NP.array( lst, dtype=NP.uint8 ) #numpy converts the strings.  Nifty!

or more concisely:

v1fColor = NP.array( v1fColor.split(','), dtype=NP.uint8 )

Note that it is a little more customary to do:

import numpy as np

compared to import numpy as NP

EDIT

Just today I learned about the function numpy.fromstring which could also be used to solve this problem:

NP.fromstring( "1,2,3" , sep="," , dtype=NP.uint8 )


回答4:

I am writing this answer so if for any future references: I am not sure what is the correct solution in this case but I think What @David Robinson initially publish was the correct answer due to one reason: Cosine Similarity values can not be greater than one and when I use NP.array(v1fColor.split(","), dtype=NP.uint8) option I get strage values which are above 1.0 for cosine similarity between two vectors.

So I wrote a simple sample code to try out:

import numpy as np
import numpy.linalg as LA

def testFunction():
    value1 = '2,3,0,80,125,15,5,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    value2 = '2,137,0,4,96,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    #v1fColor = np.array(map(int,value1.split(',')))
    #v2fColor =  np.array(map(int,value2.split(',')))
    v1fColor = np.array( value1.split(','), dtype=np.uint8 )
    v2fColor = np.array( value2.split(','), dtype=np.uint8 )
    print v1fColor
    print v2fColor
    cosineValue = cx(v1fColor, v2fColor)
    print cosineValue

if __name__ == '__main__':
    testFunction()

if you run this code you should get the following output:

Not lets un commented two lines that and run the code with the David's Initial Solution:

v1fColor = np.array(map(int,value1.split(',')))
v2fColor =  np.array(map(int,value2.split(','))) 

Keep in mind as you see above Cosine Similarity Value came up above 1.0 but when we use the map function and use do the int casting we get the following value which is the correct value:

Luckily I was plotting the values that I was initially getting and some of the cosine values came above 1.0 and I took the outputs of these vectors and manually typed it in python console, and send it via my lambda function and got the correct answer so I was very confuse. Then I wrote the test script to see whats going on and glad I caught this issue. I am not a python expert to exactly tell what is going on in two methods to give two different answers. But I leave that to either @David Robinson or @mgilson.