I am working on a subgraph matching problem (match chemical functional groups within molecules). The original code was written by another student (under Visual C++, no MS specific libraries) and it ran fine on Windows. I then added new functions to the program but did not alter the algorithm for the subgraph matching, and the new program compiled fine under gcc4.2/Mac OS X. However I am running into strange problems during runtime!
The objects & their members that are relevant here:
Atom: contains ID, Element, list of Bonds (vector of pointers to Bond objects), search_mark (bool). Functions to get the variables and to set search_mark as true or false.
Bond: contains an array of 2 pointers to atoms A and B , and a function that returns atom* A when called with parameter atom* B, vice versa.
Molecule: contains vector of pointers to atoms, and function to get an atom* using either the atom ID or the position within the vector.
A subclass of Atom: HammettAtom. The extra member it contains is an atom pointer to a related molecule atom.
This is the algorithm of the recursion function: For every atom in an array A, compare to an atom in array B(the Hammett group, typically about 10-20 atoms in size). If the element is the same, obtain the list of connected atoms of each, and repeat. The tested atoms are marked along the way, so at one point there will be no more unmarked connected atoms.
Here is the code (unaltered, only the cout bits are added by me for testing). When the function is first called, the first vector is a single atom from a test molecule, the second vector is the 2nd atom of a Hammett group molecule. (The 1st atom in Hammett has ID 'X' and can be anything.)
bool HammettCheck::checkSubproblem(vector<Atom*> bonded_atoms, vector<Atom*> my_list)
{
unsigned int truth=0;
vector<Atom*> unmarked_bonded;
vector<Atom*> unmarked_list;
cout << "\n size of Hammett array: " <<my_list.size()<< " size of mol array: "<< bonded_atoms.size() << endl; //for testing
//If number of connected atoms is different, return false.
if( bonded_atoms.size() != my_list.size() ){
return false;
}
//Create new lists.
for(unsigned int i=0; i < bonded_atoms.size() ; i++){
//Create list of unmarked connected atoms in molecule.
if( !bonded_atoms[i]->isMarked() ){
unmarked_bonded.push_back(bonded_atoms[i]);
}
//Create list of unmarked connected atoms in hammett.
if( !my_list[i]->isMarked() ){
unmarked_list.push_back( my_list[i] );
}
}
cout << "size of unmarked Hammett array: " << unmarked_list.size() << " size of unmarked mol array: "<< unmarked_bonded.size() <<endl; //for testing
//If number of unmarked connected atoms is different, return false.
if( unmarked_bonded.size() != unmarked_list.size() ){
return false;
}
//Check each unmarked atom connected in the molecule against possible atoms it could be in the hammett group.
for(unsigned int i=0; i < unmarked_bonded.size(); i++){
cout<< "atom in um_mol array considered ID: " << unmarked_bonded[i]->getID() << " Ele: " << unmarked_bonded[i]->getEle()<< endl;
/*Unmarked hammett assigned in reverse order so that the undefined "X" atom is only
assigned if a connected atom can not possibly be any other atom.*/
for(int j=(unmarked_list.size()-1); j > -1; j--){
cout << "atom in um_h_array considered ID: " << unmarked_list[j]->getID() << endl;
//If hammett atom has already been assigned to a connected atom, it cannot be assigned to another
if(!unmarked_list[j]->isMarked()){
cout << unmarked_list[j]->getID() << "is unmarked" <<endl;
/*If connected atom could only be hammett group's connection
to the rest of the molecule, assign it as such.*/
if( !strcmp(unmarked_list[j]->getEle().c_str(), "X") ){
unmarked_bonded[i]->mark();
unmarked_list[j]->mark(unmarked_bonded[i]);
truth++;
cout<< "mol atom ID "<< unmarked_bonded[i]->getID() <<" marked as X, current truth: "<< truth << endl;
cout << unmarked_list[j]->getID() << "is now marked(1)/unmarked(0) " << unmarked_list[j]->isMarked() << " and break loop "<<endl;
break;
}
/*If connected atom is the same element as a possible hammett atom,
check that atoms connections by running them through the subproblem.*/
if( !strcmp(unmarked_bonded[i]->getEle().c_str(), unmarked_list[j]->getEle().c_str()) ){
unmarked_bonded[i]->mark();
unmarked_list[j]->mark(unmarked_bonded[i]);
cout<<"found same ele between mol_id "<< unmarked_bonded[i]->getID() <<" and ham_id " << unmarked_list[j]->getID() <<endl;
vector<Atom*> new_bonded = getAttachedAtoms( unmarked_bonded[i] );
vector<Atom*> new_list = getAttachedAtoms( unmarked_list[j] );
if( checkSubproblem( new_bonded, new_list ) ){
cout<<"found same atom"<<endl;
truth++;
break;
/*If only the elements are the same do not assign
the hammett atom to this connected atom.*/
}else{
unmarked_bonded[i]->demark();
unmarked_list[j]->demark();
}
}
}
}
}
//Return true if all connected atoms can be assigned atoms of the hammett group.
if( truth == unmarked_bonded.size() ){
return true;
}else{
return false;
}
}
I run the compiled program with a test molecule of 29 atoms, and compared it to two Hammett groups. It should contain group 2 but not group 1. However it has returned true WHENEVER I started with 2 atoms that had the same element. Below is a sample of the output (the molecule does not actually contain the Hammett group at that atom)
currently at molecule atom ID 1
size of Hammett array: 1 size of mol array: 1
size of unmarked H array: 1 size of unmarked mol array: 1
atom in um_mol array considered ID: 1 Ele: N
atom in um_h_array considered ID: N1
N1is unmarked
found same ele between mol_id 1 and ham_id N1
size of Hammett array: 3 size of mol array: 3
size of unmarked H array: 3 size of unmarked mol array: 3
atom in um_mol array considered ID: 2 Ele: H
atom in um_h_array considered ID: O2
O2is unmarked
atom in um_h_array considered ID: O1
O1is unmarked
atom in um_h_array considered ID: X
X is unmarked
mol atom ID 2 marked as X, current truth: 1
X is now marked(1)/unmarked(0) 128 and break loop
atom in um_mol array considered ID: 8 Ele: C
atom in um_h_array considered ID: O2
O2is unmarked
atom in um_h_array considered ID: O1
O1is unmarked
atom in um_h_array considered ID: X
X is unmarked
mol atom ID 8 marked as X, current truth: 2
X is now marked(1)/unmarked(0) 160 and break loop
atom in um_mol array considered ID: 17 Ele: C
atom in um_h_array considered ID: O2
O2is unmarked
atom in um_h_array considered ID: O1
O1is unmarked
atom in um_h_array considered ID: X
X is unmarked
mol atom ID 17 marked as X, current truth: 3
X is now marked(1)/unmarked(0) 128 and break loop
found same atom
Hammet group 2 checkSubproblem true
Hammett added to atom 1
Sorry if that was long. But the thing was that right after I have marked the 'X' atom (1st atom in Hammett molecule), and tried to get the search_mark boolean, it had a value greater than 1. Therefore X was incorrectly 'marked' several times and the 'truth' counter went up until the condition truth == unmarked_bonded.size() was reached.
I am not sure what the actual problem is? The value 128 suggest some mixed up memory/pointer issue, but am unsure how to find that out. I am not even sure if it has anything to do with the recursion function!
I would be very grateful if anyone could suggest something that I can try. Thanks in advance!
P.S. the code for the Atom class functions.
string Atom::getID()
{
return id;
}
string Atom::getEle()
{
return ele;
}
void Atom::mark()
{
search_mark = true;
}
void Atom::demark()
{
search_mark = false;
}
void HammettAtom::mark(Atom* assigned)
{
search_mark = true;
related_mol_atom = assigned;
}
bool Atom::isMarked()
{
return search_mark;
}