-->

How would I reverse engineer a cryptographic algor

2019-03-08 11:58发布

问题:

I wrote an application that encrypts text in this way:

  1. Get the input text

  2. Reverse the text

  3. Convert to hexadecimal

  4. XOR with a key

  5. Base64 encode

Now, I didn't do a lot of encryption/encoding myself, so my question might sound stupid, but, say I get a file which has a content from the above algorithm and I didn't know about this algorithm. How would one start "breaking" the text, are there any guidelines, principals, rules to follow?

My question is not tied to those 5 steps, that was a pure example.

As a different example, take the text: A751CD9E1F99. How would I start investigating what this might mean?

回答1:

In order to break a cipher, cryptanalysts use all the information they can gather. Attacks fall into a number of categories, depending on what is known. Some of the main attacks, from hardest to easiest, are

  • ciphertext-only: this is the hardest attack. The analyst tries to collect as many encrypted messages as he can, and analyzes them to look for patterns or biases in the frequency of symbols. However, with good, modern ciphers, there are no patterns. This is infeasible with a good cipher, properly used.
  • known-plaintext: having the plaintext corresponding some ciphertext is a big step toward recovering unknown plaintext from new ciphertexts. This is where "reverse-engineering" really begins, because he can test his hypotheses about the algorithm against known input and output. In World War II, cryptanalysts worked hard to build extensive lists of "cribs"—words that were likely to appear in the enemy's messages—to exploit known-plaintext attacks. For example, weather conditions on a particular day, or place names of battles, etc. were likely to be reported to headquarters in encrypted messages.
  • chosen-plaintext: even better is when the cryptanalyst can trick his enemy into encrypting a message created by the cryptanalyst. In wartime, sometimes fake information would be leaked to the enemy, hoping that it would be encrypted and help the cryptanalyst break the code.
  • adaptive chosen-plaintext: this is sort of an iterative approach to known plaintext. The cryptanalyst can repeatedly have his chosen-plaintext encrypted by the system, and looks at the results to adjust his next attempt.

Nowadays, likely ways to break a code are through flaws in the system. For example, poor key management might be used, allowing the key to be stolen or guessed. In other cases, a "side-channel" attack might be used. For example, by carefully measuring the time it takes for certain cryptographic operations, an attack might be able to guess that certain bits or bytes of a key are zero, causing a fast path through some algorithm.

Up near the "tinfoil hat" end of the spectrum are methods to intercept radio emissions from computing equipment. This allows a remote agent to "see" what is displayed on a monitor. There are even specially designed fonts to try and disrupt this sort of eavesdropping.



回答2:

Basically, this kind of encryption would be reasonable easy to decypher. The base64 encoding would be reasonable easy to recognize. (You'd be using only 64 characters, which is typical for Base64.) Next would be the step to find the original XOR key. that's a bit harder but there are several algorithms that can detect these keys, if there's enough encrypted data available. Your simple text wouldn't be enough but if they know it should become a hexadecimal string, things become a lot easier. Then they have do reverse your other steps. All of them are way too easy.

If possible, it should be able to hack if the hacker knows the original value before encryption. In such cases, a string as short as the one provided could be enough to at least discover your complete encryption routine, although the key you used to XOR the string might not be completely known.

Okay, let's try to decrypt A751CD9E1F99... 12 characters. You only seem to use a few characters so it appears to be just some hexadecimal string. The original must have been 6 characters. Values would be in the range from 0x51 to 0xCD which is too big to use for base64 encoding. Also, since most values are above 0x7F, which suggest that you've done some encoding over it. A dictionary attack could already provide some insight in the XOR key used, where you'd XOR the 6 hexadecimal values with lots of words of 6 characters just to see which one return another word in your dictionary. The ones that seem to return valid words could be the keys you've used to XOR the original. With a second encrypted string, those discovered keys could be used again, filtering the set of possible keys to an even smaller set. On a modern system, such a dictionary attack could return a result within a day.

About 50 years ago, this encryption scheme would be very powerful. Nowadays, expect it to be cracked within a day, by anyone who is interested in trying to decipher it.

I'm not an expert at cracking encryption but I know enough to know which encryption methods are just too weak to use. About 10 years ago, I worked on a project that stored password in an encrypted file using a complex XOR mechanism like yours. A customer then decided to check the security and had a specialist investigate just the passwords files. He only knew one username and password and that user account had no administrative rights. But it was enough information for him to crack that security within an hour, read the information about the administrator accounts and then use that information to just do whatever he liked. My company then gave him free beer for a week... :-) Thus, 10 years ago, a specialist needed just an hour. Nowadays, they're cracking even more complex algorithms with relative ease, simply because computers are way more powerful. If you have to use this kind of encryption then you can just as well use no encryption. It wouldn't really matter for a hacker.



回答3:

You'd be able to try guess the algorithm if you know how to decrypt it. I can create many algorithms that would result in "A751CD9E1F99" for some input.

Now, if you have many inputs / outputs available, you could try changing only a little your input to see what happens to the output, for instance. Good encryption algorithms usually result in major output changes for minor input changes.



回答4:

I think you should start by reading The Code Book. What you are asking is how to crack encryption methods and that will give you a start as to how they work.



回答5:

You would need a larger text base than that and some understanding that the crypt is coming from a particular language/domain. Then based on the frequency of words in that language/domain, one could potentially decipher certain attributes form the text.

Of course, good ciphers work around this. Only poorly implemented ciphers can be broken easily with this method.



回答6:

Rubber-hose cryptanalysis can be quite effective.



回答7:

Ciphertext indistinguishability is a good page to start with to understand what crypto algorithms are designed to prevent/defend against. The type of attacks (e.g. IND-CPA) mentioned can also give you a clue about what you can start with.



回答8:

If you have access to a black box which does the encryption, you can get a lot of information by feeding it particular input values.

As a simple example, if the black box does "one time pad" style encryption, if you feed it all zeroes you get the one time pad. (In fact, feeding it any input value will get you the one time pad with an additional xor.)

Note that good cryptosystems are resistant to such attacks, even if the cryptosystem is already known (but the key is not).



回答9:

An attacker would probably (in general) do the followings:

Identify and defeat any 'visible for the eye encoding' or trivial crypto, like reversing of text, Base64 encoding, ROT13, ect.

At a point that they find a high entropy state they try to acquire more pieces of encoded data, and XOR them together. This results in an XOR-ing of the two original plain-text with the key cancelled out in case the encoding is indeed XOR based (like RC4) and the key was constant. If the attacker can get hold of any plain-text - encoded data combination, all other encoded data is decode-able.

At final desperation they may try to test against most common practices, like where they use RC4 or other simple algos with a dynamic key, and put the key on the end or beginning of the file/data.

If they only have access to encoded text, this is pretty much the end of the road. In case they have a access to.. like an API where they can produce the encoded version of a supplied plain-text, then they will trivially identify if its a bit based (like XOR), or block cypher, or feed forward block cypther encoding, but the getting the key and the actual algo is still a problem.

If they have access to the decoding program for symmetric key encoding (like your XOR), or the encoding program of asymmetric key encoding, the encoding is most likely instantly defeated by reversing it.



回答10:

That is kind of impossible, you'd fail at the XOR decryption if you don't have any knowledge about what key was used.

In a general case, it is even more impossible (if that is possible :)) to gauge what an encrypted string might mean.



回答11:

Err... I'd say:

  1. Base64 decode the output.
  2. XOR the output with the input to get the key.

I'm assuming that since this is a simple encryption algorithm, it should be easy to reverse it this way if you know the input and the output, but not the key.