可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm writing my own directory buster in python, and I'm testing it against a web server of mine in a safe and secure environment. This script basically tries to retrieve common directories from a given website and, looking at the HTTP status code of the response, it is able to determine if a page is accessible or not.
As a start, the script reads a file containing all the interesting directories to be looked up, and then requests are made, in the following way:

for dir in fileinput.input('utils/Directories_Common.wordlist'):

    try:
        conn = httplib.HTTPConnection(url)
        conn.request("GET", "/"+str(dir))
        toturl = 'http://'+url+'/'+str(dir)[:-1]
        print '    Trying to get: '+toturl
        r1 = conn.getresponse()
        response = r1.read()
        print '   ',r1.status, r1.reason
        conn.close()

Then, the response is parsed and if a status code equal to "200" is returned, then the page is accessible. I've implemented all this in the following way:

if(r1.status == 200):
    print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'

All seems fine to me except that the script marks as accessible pages that actually aren't. In fact, the algorithm collects the only pages that return a "200 OK", but when I manually surf to check those pages I found out they have been moved permanently or they have a restricted access. Something went wrong but I cannot spot where should I fix the code exactly, any help is appreciated..

回答1:

I did not found any problems with your code, except it is almost unreadable. I have rewritten it into this working snippet:

import httplib

host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']

for directory in directories:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', '/' + directory)

    url = 'http://{0}/{1}'.format(host, directory)
    print '    Trying: {0}'.format(url)

    response = conn.getresponse()
    print '    Got: ', response.status, response.reason

    conn.close()

    if response.status == 200:
        print ("[!] The subdirectory '{0}' "
               "could be interesting.").format(directory)

Outputs:

$ python snippet.py
    Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
    Got:  404 Not Found
    Trying: http://www.google.com/reader
    Got:  302 Moved Temporarily
    Trying: http://www.google.com/news
    Got:  200 OK
[!] The subdirectory 'news' could be interesting.

Also, I did use HEAD HTTP request instead of GET, as it is more efficient if you do not need the contents and you are interested only in the status code.