I have a string which contains 3 elements:
- a 3 digit code (example: SIN, ABD, SMS, etc)
- a 1 digit code type (example: 1, 2, 3, etc)
- a 3 digit number (example: 500, 123, 345)
Example string: SIN1500, ABD2123, SMS3345, etc..
I wanted to generate a UNIQUE 10 digit alphanumeric and case sensitive string (only 0-9/a-z/A-Z is allowed), with more than 2^30 (about 1 billion) unique combination per string supplied. The generated code must have a particular algorithm so that I can reverse the process. For example:
public static void main(String[] args) {
String test = "ABD2123";
String result = generateData(test);
System.out.println(generateOutput(test)); //for example, the output of this is: 1jS8g4GDn0
System.out.println(generateOutput(result)); //the output of this will be ABD2123 (the original string supplied)
}
What I wanted to ask is is there any ideas/examples/libraries in java that can do this? Or at least any hint on what keyword should I put on Google?
I tried googling using the keyword java checksum, rng, security, random number, etc and also tried looking at some random number solution (java SecureRandom, xorshift RNG, java.util.zip's checksum, etc) but I can't seem to find one?
Thanks!
EDIT:
My use case for this program is to generate some kind of unique voucher number to be used by specific customers.
The string supplied will contains 3 digit code for company ID, 1 digit code for voucher type, and a 3 digit number for the voucher nominal. I also tried adding 3 random alphanumeric (so the final digit is 7 + 3 digit = 10 digit). This is what I've done so far, but the result is not very good (only about 100 thousand combination):
public static String in ="somerandomstrings";
public static String out="someotherrandomstrings";
public static String encrypt(String kata)
throws Exception {
String result="";
String ina=in;
String outa=out;
Random ran = new Random();
Integer modulus=in.length();
Integer offset= ((Integer.parseInt(Utils.convertDateToString(new Date(), "SS")))+ran.nextInt(60))/2%modulus;
result=ina.substring(offset, offset+1);
ina=ina+ina;
ina=ina.substring(offset, offset+modulus);
result=result+translate(kata, ina, outa);
return result;
}
EDIT:
I'm sorry I forgot to put the "translate" function :
public static String translate(String kata,String seq1, String seq2){
String result="";
if(kata!=null&seq1!=null&seq2!=null){
String[] a=kata.split("");
for (int j = 1; j < a.length; j++) {
String b=a[j];
String[]seq1split=seq1.split("");
String[]seq2split=seq2.split("");
int hint=seq1.indexOf(b)+1;
String sq="";
if(seq1split.length>hint)
sq=seq1split[hint];
String sq1="";
if(seq2split.length>hint)
sq1=seq2split[hint];
b=b.replace(sq, sq1);
result=result+b;
}
}
return result;
}
EDIT:
Okay, after a few days I'm currently struggling to convert the Ruby code provided by @sarnold, this is the code I've come up with :
public class Test {
static char START_A = "A".charAt(0);
static char START_a = "a".charAt(0);
static char START_0 = "0".charAt(0);
static int CODEMASK = ((1 << 28) - 1); //turn on lower 28 bits
static int RANDOMMASK = ((1 << 60) - 1) & ~ CODEMASK; //turn on upper 32 bits
public static void main(String[] args) {
String[] test = new String[]{
//"AAA0000", "SIN1500", "ABD2123", "SMS3345", "ZZZ9999",
//"ABD2123", "ABD2123", "ABD2123", "ABD2123", "ABD2123"
"ABD2123"
};
for(String t : test){
long c = compress(t);
long a = add_random(c);
String output = to_output(a);
long input = from_output(output);
String[] new_c_r = remove_random(input);
String u = uncompress(Long.valueOf(new_c_r[0]));
System.out.println("Original input: " + t);
System.out.println(" compressed: " + c);
System.out.println(" after adding random amount: " + a);
System.out.println(" *output: " + output);
System.out.println(" *input: " + input);
System.out.println(" random amount added: " + new_c_r[1]);
System.out.println(" after removing random amount: " + new_c_r[0]);
System.out.println(" uncompressed: " + u);
System.out.println("-----------------------------------------------------------------");
}
}
public static long compress(String line){ //7 character
char a = line.charAt(0);
char b = line.charAt(1);
char c = line.charAt(2);
char h = line.charAt(3);
char i = line.charAt(4);
char j = line.charAt(5);
char k = line.charAt(6);
long small_a = (long) a - START_A;
long small_b = (long) b - START_A;
long small_c = (long) c - START_A;
long letters = (small_a * 26 * 26) + (small_b * 26) + small_c;
long numbers = letters * 10000 + h * 1000 + i*100 + j*10 + k;
return numbers;
}
public static String uncompress(long number){
long k = number % 10;
number /= 10;
long j = number % 10;
number /= 10;
long i = number % 10;
number /= 10;
long h = number % 10;
number /= 10;
long small_c = number % 26;
number /= 26;
long small_b = number % 26;
number /= 26;
long small_a = number % 26;
number /= 26;
if (number != 0) throw new RuntimeException("input wasn't generated with compress()");
long a = small_a + START_A;
long b = small_b + START_A;
long c = small_c + START_A;
StringBuffer result = new StringBuffer();
result.append((char) a).append((char) b).append((char) c).append(h).append(i).append(j).append(k);
return result.toString();
}
public static long add_random(long number){
return (((long) (Math.random()* Math.pow(2, 31))) << 28) + number;
}
public static String[] remove_random(long number){
return new String[]{String.valueOf(number & CODEMASK), String.valueOf(number & RANDOMMASK)};
}
public static String to_output(long number){
List<Character> a = new ArrayList<Character>();
do{
a.add(transform_out(number % 62));
number /= 62;
}while(number > 0);
Collections.reverse(a);
StringBuffer result = new StringBuffer();
for(int i=0; i<a.size(); i++){
Character s = (Character) a.get(i);
result.append(s);
}
return result.toString();
}
public static long from_output(String string){
long num = 0;
for(char c : string.toCharArray()){
num *= 62;
num += transform_in(c);
}
return num;
}
public static char transform_out(long small){
long out;
if (small < 0 || small > 61){
throw new RuntimeException("small should be between 0 and 61, inclusive");
}
if(small < 26){
out = START_A + small;
}else if(small < 52){
out = START_a + (small-26);
}else{
out = START_0 + (small-52);
}
return (char) out;
}
public static long transform_in(char c){
if(!String.valueOf(c).matches("[a-zA-Z0-9]")){
throw new RuntimeException("char should be A-Z, a-z, or 0-9, inclusive");
}
long num = (long) c;
long out;
if(num >= START_A && num <= START_A+26) out = num-START_A;
else if(num >= START_a && num <= START_a+26) out = (num-START_a) + 26;
else if(num >= START_0 && num <= START_0+10) out = (num-START_0) + 52;
else throw new RuntimeException("Salah, bego!");
return out;
}}
but I can't seem to make this code right, the result is like this :
Original input: ABD2123
compressed: 345451
after adding random amount: 62781252268541291
*output: EnhZJdRFaj
*input: 62781252268541291
random amount added: 0
after removing random amount: 345451
uncompressed: ABI5451
as you've probably noticed the "compressed" and the "uncompressed" field didn't show the same amount. The "random amount added" field is also always returning 0 value. Is there anyone who can help see where I'm wrong in this code?
I'm sorry if it's a mistake to put this code in this thread I will create another thread.
Thank You, Yusuf
I've written a Ruby tool that does what you need. (Sorry it isn't Java, but my Java is over a decade old by now. But the general idea should port to Java without hassle.)
In short, the given input string (7 bytes) is compressed into a number between
0
and175_760_000
(28 bits). A 32 bit random number is prepended, making a 60 bit integer, which is encoded into a 10-character output string.My earlier math was wrong; since your input is only 28 bits long and your desired output is 60 bits long, that leaves 32 bits for adding roughly two billion random permutations. I mistyped when performing my calculations.
Running the program with the five inputs given at the bottom results in this output:
The lines you're really looking for are
original input
,*output
, anduncompressed
. Your client has theoriginal input
lines, after using theto_output(add_random(compress(input)))
you will get the ten-characterA-Za-z0-9
output in the*output
line. You hand that to users, and it's a magic token of some sort. Then when it is time to validate them, you useremove_random(from_output(user_string))
to discover both the random value added to the string and an integer that you can hand touncompress()
.One very important note: The input
AAA0000
is stored in plaintext in the lower 28 bits. The random number is stored in plaintext in the upper 32 bits. This is simply an obfuscation of the original inputs, it wouldn't be hard for someone to discover the pattern if they have two inputs and outputs. Heck, they might even correctly guess the algorithm given only a single input and output.If you need this to be cryptographically strong, then you've still got some work ahead of you :) but that might be as easy as XORing the intermediate 60 bit number with rc4 output of the username or something like that.
Short Explanation
Your input strings can be interpreted as integers, but with a twist: the first three 'digits' are in base 26, the next four digits are in base 10. The number will be less than
175_760_000
. Uniquely storing numbers between0
and175_760_000
requires28
bits. Your output strings are also a number, ten digits long, with each 'digit' in base 62. (Think base64 but without-
,/
, or=
(for padding).) 62 possible values and ten positions gives you a maximum value of839_299_365_868_340_224
, which can be represented in60
bits.The input string only takes
28
bits to represent, your output string has60
bits available, and that leaves32
bits available for storing a randomly-generated number. If we multiply a32
-bit number by2^28
(the same as a left-shift by 28:1 << 28
) then the lower28
bits will be free for the originally input number.Once we've calculated the
60
bit number, we output it in our base 62 notation for human consumption.To reverse the process, you decode the base 62 number to get our
60
bit number; split the60
bit number into the lower28
bit input number and upper32
bit random number, and then output the28
bit number in the original format: three base 26 digits followed by four base 10 digits.FINAL UPDATE
Yusuf, excellent work converting my Ruby to Java. I'm very impressed, especially given how good your Java version looks: your version is more legible. I'm jealous and impressed. :)
I found the two small bugs that remained in your program:
RANDOMMASK
was accidentally initialized to0
, because Java doesn't promote all operands into the final data type when executing arithmetic shift statements. Changing1
to1L
forces the result of1L << 60
to be along
; without theL
, the result of1 << 60
is anint
, and isn't large enough to hold the full number.Further, the digits weren't being compressed correctly; my Ruby code parsed the characters as an integer, and your Java code interpreted the characters as an integer. (Yours used the character's value; mine converted the character to an integer based on the ascii meaning of the character. Mine wasn't really parsing, since it is just doing a subtraction, but if it were a string,
String.toInteger(character)
would do the same thing, so it is a lot like parsing.)But your uncompress logic was correct, and because of the mismatch, the output was incorrect. So I changed your code to parse the digits into integers (and changed from
char
toint
to silence a pointless warning).Here's a diff of what I had to change in your program to make it work:
And now the full source, just in case that's easier :)
Your requirements are very unclear. So I'm going to restrict my answer to generalities:
There are functions called cryptographic hash functions that will map from an arbitrary sequence of bytes to a "hash", with the property that the probability of any two related inputs giving the same output is vanishingly small. However, cryptographic hash functions are (by design) not reversible ... by design.
A mapping from one "space" of Strings to another that is reversible, can be implemented in two ways:
You can generate arbitrary Strings as the mapped Strings, and use data structures such as a hash tables to store the forward and reverse mappings.
You can use a cryptographic hash function to generate the mapped Strings, and use data structures such as a hash tables to store the reverse mappings.
You can use a reversible function to transform between the original and mapped Strings. This has the problem that the mapping is likely to be be easy to reverse engineer.
An example of the latter might be to turn the original String into bytes and then base64 encode it. Or even more trivially, you could insert a random character between each character in the input string. (Obviously, transformations like these can be reverse engineered in a few minutes ... given enough examples. So one has to doubt the wisdom of this approach.)
Without better requirements, it is unclear which (if any) of these approaches is what you need.