Caesar Cipher Java Program can't shift more th

2019-08-07 09:48发布

问题:

I'm creating a program to do a Caesar Cipher, which shifts the letters in a word one time when I hit enter, and prompts the user to shift again or quit.

It works until I get to 23 shifts, then it starts using non-letter symbols for some reason, and I'm not sure why this is happening.

Any suggestions? Here is the code:

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class Cipher {

    public static void main(String[] args) {

        // encrypted text
        String ciphertext;

        // input from keyboard
        Scanner keyboard = new Scanner(System.in);

        if (args.length > 0) {
            ciphertext = "";
            try {
                Scanner inputFile = new Scanner(new File(args[0]));
                while (inputFile.hasNext())
                    ciphertext += inputFile.nextLine();
            } catch (IOException ioe) {
                System.out.println("File not found: " + args[0]);
                System.exit(-1);
            }
        } else {
            System.out.print("Please enter text--> ");
            ciphertext = keyboard.nextLine();
        }

        // -----------------------------------------------------------------

        int distance = 0;  // how far the ciphertext should be shifted
        String next = "";  // user input after viewing
        while (!next.equals("quit")) {
            String plaintext = "";
            distance += 1;
            for (int i = 0; i < ciphertext.length(); i++) {
                char shift = ciphertext.charAt(i);
                if (Character.isLetter(shift)) {
                    shift = (char) (ciphertext.charAt(i) - distance);
                    if (Character.isUpperCase(ciphertext.charAt(i))) {
                        if (shift > '0' && shift < 'A') {
                            shift = (char) (shift + 26);
                            plaintext += shift;
                        } else {
                            plaintext += shift;
                        }
                    }
                    if (Character.isLowerCase(ciphertext.charAt(i))) {
                        if (shift > '0' && shift < 'a' && ciphertext.charAt(i) < 't') {
                            shift = (char) (shift + 26);
                            plaintext += shift;
                        } else {
                            plaintext += shift;
                        }
                    }
                } else {
                    plaintext += shift;
                }
            }

            System.out.println(ciphertext);

            // At this point, plaintext is the shifted ciphertext.
            System.out.println("distance " + distance);
            System.out.println(plaintext);
            System.out.println("Press enter to see the next option,"
                    + "type 'quit' to quit.");
            next = keyboard.nextLine().trim();
        }
        System.out.println("Final shift distance was " + distance + " places");
    }
}

回答1:

How does the shifting in your method work? Well, it exploits the fact that a char can, in Java, also be viewed as an int, a simple number.

Because of that you can do stuff like this:

char c = 'A';                                 // Would print: A
int cAsValue = (int) c;                       // Would print: 65
int nextValue = cAsValue + 1;                 // Would print: 66
char nextValueAsCharacter = (char) nextValue; // Would print: B

or even that:

int first = (int) 'A';                // Would print: 65
int second = (int) 'D';               // Would print: 68
int third = first + second;           // Would print: 133
char thirdAsCharacter = (char) third; // Would not print anything meaningful

Okay, now that we know how we can interpret char as int, let us analyze why 65 represents the character A and why 133 is nothing meaningful.

The keyword here is UTF-16. Characters in Java are encoded in UTF-16 and there are tables that list all characters of that encoding with their specific decimal number, like here.

Here is a relevant excerpt:

This answers why 65 represents A and why 133 is nothing meaningful.


The reason why you experience strange results after some shifts is that the alphabet only has a size of 26 symbols.

I think you would expect that it starts all over again and a shifted by 26 is again a. But unfortunately your code is not smart enough, it simply takes the current character and adds the shift to it, like that:

char current = 'a';
int shift = 26;

int currentAsInt = (int) current;        // Would print: 97
int shifted = currentAsInt + shift;      // Would print: 123
char currentAfterShift = (char) shifted; // Would print: {

Compare that to the relevant part in the table:

So after z does not come a again but rather {.


So after the mystery was solved, let's now talk about how to fix it and make your code smarter.

You can simply check the bounds, like "if it is greater than the value for 'z' or smaller than 'a', then get it back again into the correct range". We can do so easily by using the modulo operator given by %. It divides a number by another and returns the remainder of the division.

Here is how we can use it:

char current = 'w';
int shift = 100;
int alphabetSize = 26; // Or alternatively ('z' - 'a')

int currentAsInt = (int) current;          // Would print: 119
int shiftInRange = shift % alphabetSize;   // Would print: 22
int shifted = currentAsInt + shiftInRange; // Would print: 141 (nothing meaningful)

// If exceeding the range then begin at 'a' again
int shiftCorrected = shifted;
if (shifted > 'z') {
    shiftCorrected -= alphabetSize; // Would print: 115
}

char currentAfterShift = (char) shiftCorrected; // Would print: s 

So instead of shifting by 100 we only shift be the relevant part, 22. Imagine the character going three rounds through the whole alphabet because 100 / 26 ~ 3.85. After those three rounds we go the remaining 0.85 rounds which are 22 steps, the remainder after dividing 100 by 26. That is exactly what the % operator did for us.

After going that 22 steps we could still exceed the bound, but maximally by one round. We correct that by subtracting the alphabet size. So instead of going 22 steps, we go 22 - 26 = -4 steps which emulates "going 4 steps to the end of the alphabet, then starting at 'a' again and finally going 18 steps to 's'".