https://leetcode.com/problems/encode-and-decode-tinyurlhtml
一种作法是对于每个请求的longURL,从0开始按递增的顺序用一个整数与之对应,这个整数就是对longURL的编码,同时作为索引;对短网址解码时,解析出短网址中的整数信息,查找原来的长网址便可。git
class Solution { public: // Encodes a URL to a shortened URL. string encode(string longUrl) { long_urls.push_back(longUrl); return "http://t.com/" + std::to_string(long_urls.size()-1); } // Decodes a shortened URL to its original URL. string decode(string shortUrl) { auto pos = shortUrl.find_last_of('/'); auto id = std::stoi(shortUrl.substr(pos+1)); return long_urls[id]; } private: vector<string> long_urls; };
递增方法的好处是编码的结果都是惟一的,可是缺点也是明显的:对相同的longURL,每次编码的结果都不一样,存在id和存储资源的浪费。改用哈希表能够解决空间浪费的问题,可是递增方法会把短网址的计数器暴露给用户,也许存在安全隐患。安全
改进的方法是用字符串去设计短网址,仅仅考虑数字和字母的话,就有10+2*26=62
种,变长编码天然是可行的,可是编码规则可能比较复杂,定长编码足够了。至于多长,听说新浪微博是用7个字符的,$62^7 \approx 3.5 \times 10^{12}$
,这已经远远超过当今互联网的URL总数了。因而,一个可行的作法是:对每一个新到来的长URL,随机从62个字符中选出7个构造它的key,并存入哈希表中(若是key已经用过,就继续生成新的,直到不重复为止,不太重复的几率是很低的);解码短网址时,在哈希表中查找对应的key便可。app
另外,为了避免浪费key,能够再开一个哈希表,记录每一个长网址对应的短网址。dom
class Solution { public: Solution() { short2long.clear(); long2short.clear(); dict = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; len_tiny = 7; srand(time(NULL)); } // Encodes a URL to a shortened URL. string encode(string longUrl) { if (long2short.count(longUrl)) { return "http://t.com/" + long2short[longUrl]; } string tiny = dict.substr(0, len_tiny); while (short2long.count(tiny)) { std::random_shuffle(dict.begin(), dict.end()); tiny = dict.substr(0, len_tiny); } long2short[longUrl] = tiny; short2long[tiny] = longUrl; return "http://t.com/" + tiny; } // Decodes a shortened URL to its original URL. string decode(string shortUrl) { auto pos = shortUrl.find_last_of('/'); auto tiny = shortUrl.substr(pos+1); return short2long.count(tiny)? short2long[tiny] : shortUrl; } private: unordered_map<string, string> short2long, long2short; string dict; int len_tiny; };
参考:编码