I'm creating a program to do a Caesar Cipher, which shifts the letters in a word one time when I hit enter, and prompts the user to shift again or quit.
It works until I get to 23 shifts, then it starts using non-letter symbols for some reason, and I'm not sure why this is happening.
Any suggestions? Here is the code:
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
public class Cipher {
public static void main(String[] args) {
// encrypted text
String ciphertext;
// input from keyboard
Scanner keyboard = new Scanner(System.in);
if (args.length > 0) {
ciphertext = "";
try {
Scanner inputFile = new Scanner(new File(args[0]));
while (inputFile.hasNext())
ciphertext += inputFile.nextLine();
} catch (IOException ioe) {
System.out.println("File not found: " + args[0]);
System.exit(-1);
}
} else {
System.out.print("Please enter text--> ");
ciphertext = keyboard.nextLine();
}
// -----------------------------------------------------------------
int distance = 0; // how far the ciphertext should be shifted
String next = ""; // user input after viewing
while (!next.equals("quit")) {
String plaintext = "";
distance += 1;
for (int i = 0; i < ciphertext.length(); i++) {
char shift = ciphertext.charAt(i);
if (Character.isLetter(shift)) {
shift = (char) (ciphertext.charAt(i) - distance);
if (Character.isUpperCase(ciphertext.charAt(i))) {
if (shift > '0' && shift < 'A') {
shift = (char) (shift + 26);
plaintext += shift;
} else {
plaintext += shift;
}
}
if (Character.isLowerCase(ciphertext.charAt(i))) {
if (shift > '0' && shift < 'a' && ciphertext.charAt(i) < 't') {
shift = (char) (shift + 26);
plaintext += shift;
} else {
plaintext += shift;
}
}
} else {
plaintext += shift;
}
}
System.out.println(ciphertext);
// At this point, plaintext is the shifted ciphertext.
System.out.println("distance " + distance);
System.out.println(plaintext);
System.out.println("Press enter to see the next option,"
+ "type 'quit' to quit.");
next = keyboard.nextLine().trim();
}
System.out.println("Final shift distance was " + distance + " places");
}
}
How does the shifting in your method work? Well, it exploits the fact that a
char
can, in Java, also be viewed as anint
, a simple number.Because of that you can do stuff like this:
or even that:
Okay, now that we know how we can interpret
char
asint
, let us analyze why65
represents the characterA
and why133
is nothing meaningful.The keyword here is UTF-16. Characters in Java are encoded in
UTF-16
and there are tables that list all characters of that encoding with their specific decimal number, like here.Here is a relevant excerpt:
This answers why
65
representsA
and why133
is nothing meaningful.The reason why you experience strange results after some shifts is that the alphabet only has a size of 26 symbols.
I think you would expect that it starts all over again and
a
shifted by26
is againa
. But unfortunately your code is not smart enough, it simply takes the current character and adds the shift to it, like that:Compare that to the relevant part in the table:
So after
z
does not comea
again but rather{
.So after the mystery was solved, let's now talk about how to fix it and make your code smarter.
You can simply check the bounds, like "if it is greater than the value for 'z' or smaller than 'a', then get it back again into the correct range". We can do so easily by using the modulo operator given by
%
. It divides a number by another and returns the remainder of the division.Here is how we can use it:
So instead of shifting by
100
we only shift be the relevant part,22
. Imagine the character going three rounds through the whole alphabet because100 / 26 ~ 3.85
. After those three rounds we go the remaining0.85
rounds which are22
steps, the remainder after dividing100
by26
. That is exactly what the%
operator did for us.After going that
22
steps we could still exceed the bound, but maximally by one round. We correct that by subtracting the alphabet size. So instead of going22
steps, we go22 - 26 = -4
steps which emulates "going 4 steps to the end of the alphabet, then starting at 'a' again and finally going 18 steps to 's'".