How do I create a URL shortener?

2018-12-31 08:34发布

I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef".

Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.

My approach:

I have a database table with three columns:

  1. id, integer, auto-increment
  2. long, string, the long URL the user entered
  3. short, string, the shortened URL (or just the six characters)

I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.

My idea:

For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps:

short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.

That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?

Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)

标签: algorithm url
27条回答
君临天下
2楼-- · 2018-12-31 08:36

If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/

查看更多
伤终究还是伤i
3楼-- · 2018-12-31 08:36

My Python 3 version

base_list = list("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
base = len(base_list)

def encode(num: int):
    result = []
    if num == 0:
        result.append(base_list[0])

    while num > 0:
        result.append(base_list[num % base])
        num //= base

    print("".join(reversed(result)))

def decode(code: str):
    num = 0
    code_list = list(code)
    for index, code in enumerate(reversed(code_list)):
        num += base_list.index(code) * base ** index
    print(num)

if __name__ == '__main__':
    encode(341413134141)
    decode("60FoItT")
查看更多
弹指情弦暗扣
4楼-- · 2018-12-31 08:37

My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.

查看更多
千与千寻千般痛.
5楼-- · 2018-12-31 08:37

I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.

static hashids = Hashids(salt = "my app rocks", minSize = 6)

I ran a script to see how long it takes until it exhausts the character length. For six characters it can do 164,916,224 links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.

Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho.rt/ka8ds3 as a primary key.

Here is the full concept:

function addDomain(domain) {
    table("domains").insert("domain", domain, "seq", 0)
}

function addURL(domain, longURL) {
    seq = table("domains").where("domain = ?", domain).increment("seq")
    shortURL = domain + "/" + hashids.encode(seq)
    table("links").insert("short", shortURL, "long", longURL)
    return shortURL
}

// GET /:hashcode
function handleRequest(req, res) {
    shortURL = req.host + "/" + req.param("hashcode")
    longURL = table("links").where("short = ?", shortURL).get("long")
    res.redirect(301, longURL)
}
查看更多
流年柔荑漫光年
6楼-- · 2018-12-31 08:39
/**
 * <p>
 *     Integer to character and vice-versa
 * </p>
 *  
 */
public class TinyUrl {

    private final String characterMap = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    private final int charBase = characterMap.length();

    public String covertToCharacter(int num){
        StringBuilder sb = new StringBuilder();

        while (num > 0){
            sb.append(characterMap.charAt(num % charBase));
            num /= charBase;
        }

        return sb.reverse().toString();
    }

    public int covertToInteger(String str){
        int num = 0;
        for(int i = 0 ; i< str.length(); i++)
            num += characterMap.indexOf(str.charAt(i)) * Math.pow(charBase , (str.length() - (i + 1)));

        return num;
    }
}

class TinyUrlTest{

    public static void main(String[] args) {
        TinyUrl tinyUrl = new TinyUrl();
        int num = 122312215;
        String url = tinyUrl.covertToCharacter(num);
        System.out.println("Tiny url:  " + url);
        System.out.println("Id: " + tinyUrl.covertToInteger(url));
    }
}
查看更多
皆成旧梦
7楼-- · 2018-12-31 08:40

You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:

https://gist.github.com/778542

查看更多
登录 后发表回答