I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:
for dir in fileinput.input('utils/Directories_Common.wordlist'):
try:
conn = httplib.HTTPConnection(url)
conn.request("GET", "/"+str(dir))
toturl = 'http://'+url+'/'+str(dir)[:-1]
print ' Trying to get: '+toturl
r1 = conn.getresponse()
response = r1.read()
print ' ',r1.status, r1.reason
conn.close()
Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:
if(r1.status == 200):
print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'
All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..