蟒蛇HTTP状态代码(python http status code)

我用Python写我自己的目录克星，而我在一个安全的环境中测试这对我的Web服务器。这个脚本基本上试图从一个给定的网站上检索公用目录，看着响应的HTTP状态代码，它能够确定一个页面被访问或没有。
作为一个开始，该脚本读取要查找包含所有有趣的目录中的文件，然后请求提出，通过以下方式：

for dir in fileinput.input('utils/Directories_Common.wordlist'):

    try:
        conn = httplib.HTTPConnection(url)
        conn.request("GET", "/"+str(dir))
        toturl = 'http://'+url+'/'+str(dir)[:-1]
        print '    Trying to get: '+toturl
        r1 = conn.getresponse()
        response = r1.read()
        print '   ',r1.status, r1.reason
        conn.close()

然后，响应被解析，如果返回等于“200”的状态码，然后页面可以被访问。我实现了这一切通过以下方式：

if(r1.status == 200):
    print '\n[!] Got it! The subdirectory '+str(dir)+' could be interesting..\n\n\n'

除了剧本标志着实际上并不像访问页面一切似乎没什么问题。事实上，该算法收集所有返回“200 OK”的专属网页，但是当我手动浏览，以检查这些网页，我发现他们已被永久移动或他们有访问限制。出事了，但我看不出我应该修复代码完全相同，任何帮助表示赞赏..

Answer 1:

我没有发现你的代码的任何问题，但它几乎是不可读。我已经改写成这方面的工作片段：

import httplib

host = 'www.google.com'
directories = ['aosicdjqwe0cd9qwe0d9q2we', 'reader', 'news']

for directory in directories:
    conn = httplib.HTTPConnection(host)
    conn.request('HEAD', '/' + directory)

    url = 'http://{0}/{1}'.format(host, directory)
    print '    Trying: {0}'.format(url)

    response = conn.getresponse()
    print '    Got: ', response.status, response.reason

    conn.close()

    if response.status == 200:
        print ("[!] The subdirectory '{0}' "
               "could be interesting.").format(directory)

输出：

$ python snippet.py
    Trying: http://www.google.com/aosicdjqwe0cd9qwe0d9q2we
    Got:  404 Not Found
    Trying: http://www.google.com/reader
    Got:  302 Moved Temporarily
    Trying: http://www.google.com/news
    Got:  200 OK
[!] The subdirectory 'news' could be interesting.

另外，我也使用HEAD HTTP请求而不是GET，因为它是更有效的，如果您不需要的内容，你有兴趣只在状态代码。