How to check the url is either web page link or fi

2020-03-05 01:50发布

Suppose i have links as follows:

    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/news/

In the above links first and fourth links are web page links and second and third are the file link.

These are only some examples of files links i.e .zip and .exe, but there may be many other files.

Is there any standard way to distinguish between file url or web page link? Thanks in advance.

2条回答
何必那么认真
2楼-- · 2020-03-05 02:30
import urllib
import mimetypes


def guess_type_of(link, strict=True):
    link_type, _ = mimetypes.guess_type(link)
    if link_type is None and strict:
        u = urllib.urlopen(link)
        link_type = u.headers.gettype() # or using: u.info().gettype()
    return link_type

Demo:

links = ['http://stackoverflow.com/q/21515098/538284', # It's a html page
         'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
         'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
]

for link in links:
    print(guess_type_of(link))

Output:

text/html
image/x-png
text/html
application/ogg
查看更多
啃猪蹄的小仙女
3楼-- · 2020-03-05 02:45
import urllib
mytest = urllib.urlopen('http://www.sec.gov')
mytest.headers.items()

('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')]

mytest.headers.items() is a list of tuples, you can see in my example that the last item in the list describes the content

I am not sure if the length varies so you could iterate through it to find the one that has 'content-type' in it.

查看更多
登录 后发表回答