How to reduce a bigger string in smaller string in

2019-02-20 05:27发布

问题:

I want to compress a bigger string into a smaller string in C++. What are the different ways to do this in C++? The requirement is that output should also be a string.

回答1:

Well, if you don't need to uncompress it later:

string s = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
s = "";

Edit: Sounds like you want a hash function - there are a zillion out there, depending on your requirements. Google is your friend.



回答2:

As unaperson said, Google is your friend: Data Compression Algorithms.

Here are a few off the top of my head:
RLE -- Run Length encoded
Huffman
Lepel-Ziv



回答3:

As I understand from comments of question you don't need to decompress it, but want only for unique strings get unique result. The hashing algorithm which I'm going to explain very easy to understand and works perfect (I've used it lots of times in my practice). It is very simple rolling hash function which is used in Rabin-Karp string search algorithms.

Ok let's consider each string as number in 257-base system (because 257 is prime number). Examples:

  1. "10" = code('1') * 257 + code('0')
  2. "p:;" = code('p') * 257^2 + code('0') * 257 + code(';')

Where code(char a) is ascii code of character a + 1 (taking +1 to give different result for strings '\0'(n times) and '\0'(m times)). Of course if the string is big then it's appropriate number can't be stored in int or event in unsigned long long. But it's not a problem and you can just MOD it to MAX_SIZE of data-type where you going to store it. So the final code of your hash function is fallowing.

unsigned long long hash(const string & s)
{
    unsigned long long ret = 0;
    for(int i = 0; i < s.size(); ++i)
    {
        ret *= 257;
        ret += s[i] + 1;
    }
    return ret;
}

EDIT: Added source of this algorithm.