[LeetCode] Encode and Decode Tiny URL | 短网址算法

时间 2019-11-29

标签 leetcode encode decode tiny url 网址算法栏目 HTTP/TCP 繁體版

原文原文链接

https://leetcode.com/problems/encode-and-decode-tinyurlhtml

一种作法是对于每个请求的longURL，从0开始按递增的顺序用一个整数与之对应，这个整数就是对longURL的编码，同时作为索引；对短网址解码时，解析出短网址中的整数信息，查找原来的长网址便可。git

class Solution {
public:
    // Encodes a URL to a shortened URL.
    string encode(string longUrl) {
        long_urls.push_back(longUrl);
        return "http://t.com/" + std::to_string(long_urls.size()-1);
    }

    // Decodes a shortened URL to its original URL.
    string decode(string shortUrl) {
        auto pos = shortUrl.find_last_of('/');
        auto id = std::stoi(shortUrl.substr(pos+1));
        return long_urls[id];
    }
    
private:
    vector<string> long_urls;
};

递增方法的好处是编码的结果都是惟一的，可是缺点也是明显的：对相同的longURL，每次编码的结果都不一样，存在id和存储资源的浪费。改用哈希表能够解决空间浪费的问题，可是递增方法会把短网址的计数器暴露给用户，也许存在安全隐患。安全

改进的方法是用字符串去设计短网址，仅仅考虑数字和字母的话，就有10+2*26=62种，变长编码天然是可行的，可是编码规则可能比较复杂，定长编码足够了。至于多长，听说新浪微博是用7个字符的， $62^7 \approx 3.5 \times 10^{12}$ ，这已经远远超过当今互联网的URL总数了。因而，一个可行的作法是：对每一个新到来的长URL，随机从62个字符中选出7个构造它的key，并存入哈希表中（若是key已经用过，就继续生成新的，直到不重复为止，不太重复的几率是很低的）；解码短网址时，在哈希表中查找对应的key便可。app

另外，为了避免浪费key，能够再开一个哈希表，记录每一个长网址对应的短网址。dom

class Solution {
public:
    Solution() {
        short2long.clear();
        long2short.clear();
        dict = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
        len_tiny = 7;
        srand(time(NULL));
    }
    
    // Encodes a URL to a shortened URL.
    string encode(string longUrl) {
        if (long2short.count(longUrl)) {
            return "http://t.com/" + long2short[longUrl];
        }
        string tiny = dict.substr(0, len_tiny);
        while (short2long.count(tiny)) {
            std::random_shuffle(dict.begin(), dict.end());
            tiny = dict.substr(0, len_tiny);
        }
        long2short[longUrl] = tiny;
        short2long[tiny] = longUrl;
        return "http://t.com/" + tiny;
    }

    // Decodes a shortened URL to its original URL.
    string decode(string shortUrl) {
        auto pos = shortUrl.find_last_of('/');
        auto tiny = shortUrl.substr(pos+1);
        return short2long.count(tiny)? short2long[tiny] : shortUrl;
    }
    
private:
    unordered_map<string, string> short2long, long2short;
    string dict;
    int len_tiny;
};

参考：编码

http://www.mamicode.com/info-detail-1724865.html
如何设计一个短网址服务(TinyURL), https://soulmachine.gitbooks.io/system-design/content/cn/distributed-id-generator.html
如何设计短网址系统(TinyURL), http://cn.soulmachine.me/2017-04-10-how-to-design-tinyurl/