可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a large and "unique" integer (actually a SHA1 hash).

Note: While I'm talking here about SHA1 hashes, this is not a cryptography / security question! I'm not trying to break SHA1. Imagine a random 160-bit integer instead of SHA1 if that will help.

I want (for no other reason than to have fun) to find an algorithm to map that SHA1 hash to a computer-generated (pseudo-)English phrase. The mapping should be bidirectional (i.e., knowing the algorithm, one must be able to calculate the original SHA1 hash from that phrase.)

The phrase need not make sense. I would even settle for a whole paragraph of nonsense. (Though quality — englishness — of a paragraph should probably be better than for a mere phrase.)

A better algorithm would produce shorter, more natural-looking, more unique phrases.

A variation: it is OK if I will be able to work only with a part of hash. Say, first six hex digits is fine.

The possible usage of the generated phrase: the human readable version of Git commit ID, to use as a motto for a given program version, which is built from that commit. (As I said, this is "for fun". I don't claim that this is very practical — or be much more readable than the SHA1 itself.)

Possible approach: In the past I've attempted to build a probability table (of words), and generate phrases as Markov chains, seeding the generator (picking branches from probability tree), according to the bits I read from the SHA. This was not very successful, the resulting phrases were too long and ugly. I'm not sure if this was a bug, or the general flaw in the algorithm, since I had to abandon it early enough.

Now I'm thinking about attempting to solve the problem once again. Any advice on how to approach this? Do you think Markov chain approach can work here? Something else?

回答1:

A very simple approach would be: Take list of say 1024 nouns, 1024 verbs and 1024 adjectives each. Your phrase could then be sentence of the form

noun[bits_01-10] verb[bits11-20] adjective[bits21-30] verb[bits31-40],
noun[bits_41-50] verb[bits51-60] adjective[bits61-70] verb[bits71-80],
noun[bits_81-90] verb[bits91-100] adjective[bits101-110] verb[bits111-120] and 
noun[bits_121-130] verb[bits131-140] adjective[bits141-150] verb[bits151-160].

With a bit more linguistic thought you can probably construct slightly more complicated ad thus not so repetitive looking sentences (say, a bit for singular / plural, a bit of two for different tenses,...). Longer word lists use up a few more bits but my guess is that you reach rather exotic words quite fast.

回答2:

We'll, lets see... The english language has about 1,000,000 words. That's about 20 bits per word. SHA1 is 160 bits, so you'll need 8 words. Theoretically, All you'll have to do is to take the n'th word of the oxford english dictionary, where n is a group of 20 bits at a time.

Now, to make it more natural, you can try to add "in/at/on/and/the..." between words, according to their type (nouns,verbs...) using some simple algorithm. (You should remove all these words from your base dictionary, of course).

The algorithm is reversible: Just remove all the words you've added, and convert each word to it's 20-bit index.

Also, try google "insult generator". Some of those generators are pretty nice. I'm not sure about the number of combinations, though.

You can buy the Oxford English Dictionary on CD-ROM with more than 500,000 words (19-bit). I'm not sure if it would be easy to extract the words and their types, however. I'm not sure if it is legal, but I think you can't claim a patent on dictionary entries...

回答3:

This is an old question but entropoetry is a JavaScript (Node/frontend) library that also solves this problem. It combines Markov poetry with Huffman coding, so given the same dictionary (i.e., the same version of the library), converting poetry↔︎numbers will be bidirectional.

Example, from the Node command line:

> var Poet = require('entropoetry'); var p = new Poet();
> p.stringify(Buffer.from('deadbeef', 'hex'))
'old trick of loving you\nif you but'
> console.log(p.parse(`old trick of loving you
... if you but`))
<Buffer de ad be ef>

And as technology marches on, what seemed like a “fun only” idea in 2011 has some real uses in 2017: memorizing cryptocurrency private keys (brain wallet), Dat/IPFS links, etc.

回答4:

Hash function means it is not possible (within reasonable limits) to get a data from hash, unless it is broken (insecure).

Question should be about breaking SHA-1 hash algorithm - look at Google, it's not that broken. So no, you cannot create English phrase from SHA-1 hash code, if you can, please make a huge paper about that, lot of them are useless, this would be breakthrough :-)

Edit: if only part of hash is enough, I suggest just brute force (+ simple map of hash<->phrase, possibly in a file or db), breaking hash algorithm is very "strong soup" (difficult problem).

Edit2: guys be more specific when asking question, not my fault... I will not delete this so that it scares off any other crypto guys around :-)