How can I unshorten a URL?

I want to be able to take a shortened or non-shortened URL and return its un-shortened form. How can I make a python program to do this?

Additional Clarification:

Case 1: shortened --> unshortened
Case 2: unshortened --> unshortened

e.g. bit.ly/silly in the input array should be google.com in the output array
e.g. google.com in the input array should be google.com in the output array

标签： python url url-shortener

7条回答

唯我独甜

2楼-- · 2019-01-08 21:28

http://github.com/stef/urlclean

sudo pip install urlclean
urlclean.unshorten(url)

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

3楼-- · 2019-01-08 21:39

Send an HTTP HEAD request to the URL and look at the response code. If the code is 30x, look at the Location header to get the unshortened URL. Otherwise, if the code is 20x, then the URL is not redirected; you probably also want to handle error codes (4xx and 5xx) in some fashion. For example:

# This is for Py2k.  For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

0人赞添加讨论(0) 举报

老娘就宠你

4楼-- · 2019-01-08 21:40

Unshorten.me has an api that lets you send a JSON or XML request and get the full URL returned.

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2019-01-08 21:43

Here a src code that takes into account almost of the useful corner cases:

set a custom Timeout.
set a custom User Agent.
check whether we have to use an http or https connection.
resolve recursively the input url and prevent ending within a loop.

The src code is on github @ https://github.com/amirkrifa/UnShortenUrl

comments are welcome ...

import logging
logging.basicConfig(level=logging.DEBUG)

TIMEOUT = 10
class UnShortenUrl:
    def process(self, url, previous_url=None):
        logging.info('Init url: %s'%url)
        import urlparse
        import httplib
        try:
            parsed = urlparse.urlparse(url)
            if parsed.scheme == 'https':
                h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
            else:
                h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
            resource = parsed.path
            if parsed.query != "": 
                resource += "?" + parsed.query
            try:
                h.request('HEAD', 
                          resource, 
                          headers={'User-Agent': 'curl/7.38.0'}
                                   }
                          )
                response = h.getresponse()
            except:
                import traceback
                traceback.print_exec()
                return url

            logging.info('Response status: %d'%response.status)
            if response.status/100 == 3 and response.getheader('Location'):
                red_url = response.getheader('Location')
                logging.info('Red, previous: %s, %s'%(red_url, previous_url))
                if red_url == previous_url:
                    return red_url
                return self.process(red_url, previous_url=url) 
            else:
                return url 
        except:
            import traceback
            traceback.print_exc()
            return None

0人赞添加讨论(0) 举报

How can I unshorten a URL?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间