I would like to use rdkit to generate count Morgan fingerprints and feed them to a scikit Learn model (in Python). However, I don't know how to generate the fingerprint as a numpy array. When I use
from rdkit import Chem
from rdkit.Chem import AllChem
m = Chem.MolFromSmiles('c1cccnc1C')
fp = AllChem.GetMorganFingerprint(m, 2, useCounts=True)
I get a UIntSparseIntVect that I would need to convert. The only thing I found was cDataStructs (see: http://rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html), but this does not currently support UIntSparseIntVect.
Maybe a little late to answer but these methods work for me
If you want the bits (0 and 1):
And back to a fingerprint:
If you want the counts:
Output:
And back to a fingerprint (Not sure this is the best way to do this):
It seems there is no direct way to get a numpy array so I build it from the dictionary.