I have simple recursive program to find connected subgraphs. The code works by traversing, from each unvisited vertex in the graph, all the edges (recursively) and marking those which have been visited with 'ingraph=False' in the process. (the graphs are always undirected unweighted graphs).
The problem I have is that for large graphs (with a subgraph of ~100,000 vertices) python(-2.7) returns a segmentation fault. But this worked just fine in python-2.6 (and still does).
Can someone explain to me what has changed between two (or maybe it's something else entirely)? Is there a way to fix it with python-2.7 (which preferably also doesn't break upon migrating to python-3 at some point)? Or should I rewrite it in a non-recursive way?
Thanks!
here's the source update: see update 3 below for a nonrecursive solution
def floodfill(g):
for v in g.vertices: v.ingraph = True
sys.setrecursionlimit(g.size)
subgraph = 1
for v in g.vertices:
if v.ingraph:
v.ingraph = False
v.subgraph = subgraph
g.floodfill_search(v,subgraph)
subgraph += 1
def floodfill_search(g,v,subgraph):
for n in v.neighbors:
if n.ingraph:
n.ingraph = False
n.subgraph = subgraph
g.floodfill_search(n,subgraph)
------ UPDATE --------
I made a small recursion test which gives recursions limit of ~16,000, ~24,000 and ~28,000 for 3 different computers. Moreover the result is not even constant for one pc. Running the test twice gives slightly different limits. For example for the second I find results between 23800 and 23819.
#!/usr/bin/python
import sys
def callme(i):
print(i)
i+=1
callme(i)
sys.setrecursionlimit(int(1e6))
callme(0)
I don't really get which 'C stack' is referred to, as far I can tell there is no default 'stack' implemented in C. In C++ there are stacks but it doesn't have the same limitations. The following C++ example runs fine (at least up to 1M pushes).
#include <iostream>
#include <stack>
using namespace std;
int main () {
stack<int> s;
for(int i=0;i<1000000;i++) {
cout << "push " << i << endl;
s.push(i);
}
}
The following C code also goes much deeper (about 10x, ~262,000)
#include "stdio.h"
void callme(i) {
printf("calling %d\n",i);
callme(++i);
}
int main() {
int i=0;
callme(i);
}
---- UPDATE 2 -----
Ok, this is python's intention apparently. Forcing programmers to avoid deep recursion. http://neopythonic.blogspot.ch/2009/04/tail-recursion-elimination.html
In any case I now think it is better to rewrite it iteratively. But then I will probably start over in C++, using some graph theory library like the boost graph library. If I am going to have to rewrite it anyway, I might as well do it properly.
Nevertheless, I would still appreciate any comments to understand why this happens at these specific sizes.
---- UPDATE 3 -----
Here's at least a quick and dirty python rewrite. Dirty because it's O(N^2) because of the last line. There should be a nicer O(N) solution by keeping track a list of which vertices have not been visited, but didn't see it so quickly and this works for my applications.
def floodfill_nonrecursive(g):
for v in g.vertices: v.ingraph = True
start = g.vertices
subg = 1
while start:
q = [start[0]]
while q:
v = q.pop()
v.ingraph = False
v.subgraph = subg
for n in v.neighbors:
if n.ingraph:
n.ingraph = False
q.append(n)
subg += 1
start = [v for v in g.vertices if v.ingraph]