The following codes try to generate random strings over K runs.
But we want the newly generated strings to be totally different
with its reference string.
For that I tried to use "continue" to restart the random
string generation process. However it doesn't seem to work.
What's wrong with my approach below?
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <time.h>
using namespace std;
// In this code we want to print new string that is entirely different with
// with those in initVector
template <typename T> void prn_vec(std::vector < T >&arg, string sep="")
{ // simple function for printing vector
for (int n = 0; n < arg.size(); n++) {
cout << arg[n] << sep;
}
}
int main ( int arg_count, char *arg_vec[] ) {
// This is reference string
vector <string> initVec;
initVec.push_back("A");
initVec.push_back("A");
initVec.push_back("A");
initVec.push_back("A");
vector <string> DNA;
DNA.push_back("A");
DNA.push_back("C");
DNA.push_back("G");
DNA.push_back("T");
for (unsigned i =0; i< 10000; i++) {
vector <string> newString;
for(unsigned j=0; j<initVec.size(); j++) {
int dnaNo = rand() % 4;
string newBase = DNA[dnaNo];
string oldBase = initVec[j];
int sameCount = 0;
if (newBase == oldBase) {
sameCount++;
}
if (sameCount == initVec.size()) {
continue;
}
newString.push_back(newBase);
}
cout << "Run " << i << " : ";
prn_vec<string>(newString);
cout << endl;
}
return 0;
}
Your code looks fine on first glance, unless I am missing a big part of your requirements.
Read this before you use rand()
. Except of course, the continue
part. What you are trying to do is see if this is the same as the initVector
or not, right? A simple comparison would do before you push it in or print to the console.
int sameCount = 0;
if (newBase == oldBase) {
sameCount++;
}
// sameCount can be 1 at most, 0 otherwise
// this check never return true
if (sameCount == initVec.size()) {
continue;
}
The sameCount
variable is initialized each time you create a new entry to the newString
and goes out of scope at the closing }
of the for
loop. So, it will not be incremented to function as a proper check against duplicate generation. You should ideally, use a std::set
and keep inserting in it. Duplicates are not allowed and you are saved from a lot of trouble.
More on using rand()
srand()
and random number generation:
From the comp.lang.c FAQ:
[...]the low-order bits of many random number generators are distressingly non-random
If you want to keep your randome numbers in the range
[0, 1, ... N - 1]
a better method compared to the simple rand() % N
(as advised in the link) is to use the following:
(int)((double)rand() / ((double)RAND_MAX + 1) * N)
Now, if you were to run your program, every time you will get the same set of 10000 odd random DNA strands. Turns out this is because:
It's a characteristic of most pseudo-random number generators (and a defined property of the C library rand) that they always start with the same number and go through the same sequence.
from another FAQ of comp.lang.c.
To get different strands across runs try the following:
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <ctime>
#include <cstdlib>
using namespace std;
int main ( int arg_count, char *arg_vec[] ) {
// most pseudo-random number generators
// always start with the same number and
// go through the same sequence.
// coax it to do something different!
srand((unsigned int)time((time_t *)NULL));
// This is reference string
string initVec("AAAA");
// the family
string DNA("ACGT");
for (unsigned i =0; i< 5; i++) {
string newString;
for(unsigned j=0; j<initVec.size(); j++) {
int dnaNo = (int)((double)rand() / ((double)RAND_MAX + 1) * 4);
char newBase = DNA[dnaNo];
newString += newBase;
}
// ideally push in a std::set
// for now keep displaying everything
if (newString != initVec) {
cout << "Run " << i << " : " << newString << endl;
}
}
return 0;
}
Your algorithm is bogus. Whatever you are trying to do, you aren't doing it, and because there's not a single comment in there, I can't really tell where you went wrong.
Your inner loop:
for each element of initVec (4)
create a random element
set sameCount to 0
if random element == current element of initVec, set sameCount to 1
if sameCount == 4, do something (pointless as this never happens)
add random element to newString
Adding to that, your "newString" isn't a string at all, but a vector of strings.
So, your problem isn't even the use of continue
, it's that your algorithm is FUBAR.
continue
does not skip the incrementing part of the for
loop. All it does is go directly to it, skipping the rest of the body of the loop.
for(int i = 0; i < 10; i++)
{
if(i == 3)
continue;
printf("%d ", i);
}
Is equivalent to:
int i = 0;
while(i < 10)
{
if(i == 3)
goto increment;
printf("%d ", i);
increment:
i++;
}
No backslash in the printf()
since I couldn't figure out how to make the text editor let me type one. :)
Have you realized that sameCount never becomes more than 1? Since initVec.size() is greater than 1 execution never hits continue.
int sameCount = 0;
//sameCount is 0
if (newBase == oldBase) { // if it is true sameCount is 1
sameCount++;
}
// sameCount is 1 or 0
if (sameCount == initVec.size()) { //this expression is always false if initVec longer than 1
continue;
}
As others already said it is difficult to find out what was your intention with this code. Could you tell us please how do you mean "totally different" for example?
dirkgentlys answer is pretty comprehensive for what I was trying to say now.
I'd like to recommend you don't use continue though, Most coding standards recommend against using continue for good reason as it makes flow control harder to follow.