Deterministic RSA encryption in Java

2019-02-02 00:56发布

问题:

This is my first question on this site, and I only have a basic mathematical understanding of RSA, so please bear with me! :)

I'm writing a Java web application for my final year project at university. It's a web-based implementation of "Pret-a-voter", a secure voting system, for those who have heard of it.

Essentially my problem is that I want to be able to give someone performing the role of an auditor:

  • a source byte array (the plaintext to be encrypted)
  • an RSA public key file
  • a "destination" byte array, which is the result of my own computation of the cipherdata given the plaintext and the public key

I then want the auditor to be able to perform encryption using the first two items, and be satisfied that the third is the result. I therefore need the encryption to be deterministic, i.e. generate the same cipherdata each time encryption with the same plaintext and public key are repeated.

(Note - I'm working with very small blocks of data in this project - there is no symmetric encryption involved at all... I'm aware this is an "interesting" use of RSA!)

Anyway I found that in Java, using

cipher = Cipher.getInstance("RSA");

uses the default random padding scheme, at a cost of 11 bytes (so with a 2048-bit key pair, it's possible to encrypt 2048/8-11 = 245 bytes). Repeated encryptions of the same plaintext generate different ciphertexts, which is obviously not the ECB mode that I want.

My question is - should I use the following?

cipher = Cipher.getInstance("RSA/ECB/NoPadding");

I've read in lots of places that RSA is insecure without padding. Is that simply because an attacker can build a dictionary of plaintexts/ciphertexts? This is a side-effect of the deterministic encryption I require in order to allow auditors to verify my encryption, and in my scheme auditors are trusted, so that would be OK.

Part two of my question is more java-related. If I do use RSA/ECB/NoPadding as above, I believe I'm able to provide a source byte array of (say) length 128 (for a 1024-bit RSA key pair) and encrypt that to get another byte array of length 128. If I then try to encrypt that again, with a different 1024-length public key, I get

javax.crypto.BadPaddingException: Message is larger than modulus

Does anyone know why?

EDIT - encryption with NoPadding doesn't always generate this exception - it's temperamental. However, even when encryption does not generate this exception, decryption generates this:

javax.crypto.BadPaddingException: Data must start with zero

Many thanks for reading through this! Any help would be greatly appreciated.

EDIT - sorry, my original question wasn't very clear about what it is I want to do, so here's an [attempt at an] explanation:

  • The plaintext is a voter's vote in an election.
  • Pret-a-voter aims to be end-to-end verifiable without sacrificing voter confidentiality (etc). After voting, the voter is provided with a receipt that they can use to verify that their vote has been recorded correctly, and which will later show them that their vote has not been tampered with. The voter performs a comparison of the information on their receipt with an identical copy posted on the web.
  • However, it should not be possible for any voter to prove how he/she voted (as that could lead to coercion) so the information is not the plaintext, but an encrypted copy of the vote.
  • In fact, the plaintext is encrypted four times, with four different asymmetric keys - held by two different tellers, each holding two of the keys. So, a vote (plaintext) is provided to one teller, who encrypts it using public key #1, and then encrypts THAT ciphertext with his second public key, gives THAT ciphertext to the second teller who encrypts it with his two keys in the same way. The resulting ciphertext (result of four sequential encryptions) is what is posted to the web (made public). The tellers are trusted.
  • Each encrypted vote can be visualised as an "onion" where the centre is the vote and there are several layers of encryption. In order to get to the vote, each layer must be removed in turn, meaning the corresponding private keys (held by the tellers) must be applied in the reverse sequence. This is key to the security - all tellers must work cooperatively in order to decrypt the votes.
  • The web bulletin board can be visualised as a table with 5 columns - the first (on the left) holds the fully-encrypted votes (also shown on each voter's receipt), and is the only visible column during the vote-casting stage. The second column contains the same set of votes, but with the outer layer removed - teller 2 populates this column and column 3 by decrypting the votes using its private keys during the tallying stage. At the end of the tallying stage, column 5 contains the fully-decrypted votes that can then be tallied.
  • Each voter gets a receipt that links them to an encrypted vote in column 1. This doesn't show how they voted, but allows them to verify that their vote has not been tampered with as throughout the election process they can verify that their encrypted vote is still there in column 1, untouched. This is only half of the "end-to-end verification", of course, since voters are unable to verify that the decryptions have been done correctly, i.e. that there's an entry in column 2 which is their vote minus the outer layer of encryption. Each voter is responsible only for the verification UP TO the point of column 1.
  • Thereafter, it is the auditors' responsibility to check that the entries in column 1 decrypt to column 2, and so on. The way they do this is by relying on deterministic encryption and the public keys used for the encryption being public.
  • Since public keys are public, you don't want people to simply draw lines from column 5 to column 1, joining up someone's vote as it becomes repeatedly encrypted - that way, a receipt that ties you to an encrypted vote actually ties you to an unencrypted, readable vote --> coercion! So, only columns 1, 3 and 5 are public (this is why each teller performs TWO encryptions), and for each entry in column 3, only ONE of the corresponding entries in {2,4} are revealed to auditors. This prevents anyone (even auditors) from linking an encrypted vote to an unencrypted vote.
  • Auditors therefore need to take an entry in column 3, be given the corresponding entry in column 2 and the public key, and perform the same encryption to verify that they do indeed get the entry in column 2.
  • Put together, this offers end-to-end verifiability.

Sorry that was so lengthy - I hope it describes my need for deterministic encryptions. I've missed out a lot of fundamental details (I've modified this scheme heavily) but hopefully the core principles are all there. Thank you so much for reading - I really appreciate it.

回答1:

Removing the padding makes the system insecure. If the public keys are indeed public, as you say, then an attacker can simply go to column 5, take the plaintexts, and encrypt them with the 4 public keys in the proper sequence. They can then match up the resulting ciphertexts with that from the reciepts, compromising the "no coercion" property.

Random padding stops this, because the attacker doesn't know what padding to add.

You will need to use normal padding, but reveal a subset of the private keys to a subset of the auditors (usually called "scrutineers" in electoral systems). This means that one scrutineer can confirm that column 1 matches column 2, another can confirm that column 2 matches column 3, and so on. An individual scrutineer can't match a voter to a ballot, only co-operating ones.


The reason that you're getting the "Message is larger than modulus" error is because each modulus is different, so the ciphertext from one encryption may be outside the allowable range for the next encryption.



回答2:

https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Padding

The padding is there precisely to avoid a given plain text being encrypted to a single cyphertext. So if you want a deterministic (single) result for any given plain text your only option is to turn it off.



回答3:

So it seems to me that you have 2 main requirements that you are attempting to use deterministic RSA to solve:

  1. Allowing voters to ensure the integrity of their vote
  2. Allowing auditors to ensure the integrity of all votes

Digital Signatures should solve this problem. You can take your ciphertext from column 1, hash it, and encrypt the hash with a private key. That encrypted hash can then be placed in column 2. To verify the integrity of column 1, simply use the corresponding public key to decrypt the hash in column 2, hash column 1, and compare those 2 values. If they are equal, the data has not been tampered with. Only parties that have the private key could possibly tamper with the data in these columns, because only they can make a matching pair. This is similar to an HMAC, but has the advantage of using public/private keys rather than a secret shared key. Thus anybody can verify, but only trusted parties can modify.

One thing to note about deterministic schema is that it will leak information in many ways. Let's assume that I know I voted for Blue as my favorite color. I can see that the resulting ciphertext of my vote is 0x12345678. If the schema is completely deterministic, I know that anybody else that has a corresponding ciphertext of 0x12345678 also voted for Blue. Also, since you will typically have a finite set of vote choices, a chosen plaintext attack is incredibly easy. Thus you really want to let RSA do its job and use the intended padding scheme.

The next thing you may want to consider is protecting the system from a form of Replay Attack by numbering the votes or something like that. As I understand your schema, it looks like if I somehow got access to where you store your votes (or got in the middle of any communication), I could essentially spoof or spam fake votes just by replaying or copying data that I've already seen (another problem with being deterministic).