In response to this question asking about hex to (raw) binary conversion, a comment suggested that it could be solved in "5-10 lines of C, or any other language."
I'm sure that for (some) scripting languages that could be achieved, and would like to see how. Can we prove that comment true, for C, too?
NB: this doesn't mean hex to ASCII binary - specifically the output should be a raw octet stream corresponding to the input ASCII hex. Also, the input parser should skip/ignore white space.
edit (by Brian Campbell) May I propose the following rules, for consistency? Feel free to edit or delete these if you don't think these are helpful, but I think that since there has been some discussion of how certain cases should work, some clarification would be helpful.
- The program must read from stdin and write to stdout (we could also allow reading from and writing to files passed in on the command line, but I can't imagine that would be shorter in any language than stdin and stdout)
- The program must use only packages included with your base, standard language distribution. In the case of C/C++, this means their respective standard libraries, and not POSIX.
- The program must compile or run without any special options passed to the compiler or interpreter (so, 'gcc myprog.c' or 'python myprog.py' or 'ruby myprog.rb' are OK, while 'ruby -rscanf myprog.rb' is not allowed; requiring/importing modules counts against your character count).
- The program should read integer bytes represented by pairs of adjacent hexadecimal digits (upper, lower, or mixed case), optionally separated by whitespace, and write the corresponding bytes to output. Each pair of hexadecimal digits is written with most significant nibble first.
- The behavior of the program on invalid input (characters besides
[a-fA-F \t\r\n]
, spaces separating the two characters in an individual byte, an odd number of hex digits in the input) is undefined; any behavior (other than actively damaging the user's computer or something) on bad input is acceptable (throwing an error, stopping output, ignoring bad characters, treating a single character as the value of one byte, are all OK) - The program may write no additional bytes to output.
- Code is scored by fewest total bytes in the source file. (Or, if we wanted to be more true to the original challenge, the score would be based on lowest number of lines of code; I would impose an 80 character limit per line in that case, since otherwise you'd get a bunch of ties for 1 line).
EDIT: This code was written a long time before the question edit which fleshed out the requirements.
Given that a single line of C can contain a huge number of statements, it's almost certainly true without being useful.
In C# I'd almost certainly write it in more than 10 lines, even though it would be feasible in 10. I'd separate out the "parse nybble" part from the "convert a string to a byte array" part.
Of course, if you don't care about spotting incorrect lengths etc, it becomes a bit easier. Your original text also contained spaces - should those be skipped, validated, etc? Are they part of the required input format?
I rather suspect that the comment was made without consideration as to what a pleasant, readable solution would look like.
Having said that, here's a hideous version in C#. For bonus points, it uses LINQ completely inappropriately in an effort to save a line or two of code. The lines could be longer, of course...
(This is avoiding "cheating" by using any built-in hex parsing code, such as
Convert.ToByte(string, 16)
. Aside from anything else, that would mean losing the use of the word nybble, which is always a bonus.)PHP, 28 symbols:
Gah.
You aren't allowed to call me on my off-the-cuff estimates! ;-P
Here's a 9 line C version with no odd formatting (Well, I'll grant you that the tohex array would be better split into 16 lines so you can see which character codes map to which values...), and only 2 shortcuts that I wouldn't deploy in anything other than a one-off script:
No combined lines (each statement is given its own line), it's perfectly readable, etc. An obfuscated version could undoubtedly be shorter, one could cheat and put the close braces on the same line as the preceding statement, etc, etc, etc.
The two things I don't like about it is that I don't have a close(fd) in there, and main shouldn't be void and should return an int. Arguably they're not needed - the OS will release every resource the program used, the file will close without any problems, and the compiler will take care of the program exit value. Given that it's a one-time use script, it's acceptable, but don't deploy this.
It becomes eleven lines with both, so it's not a huge increase anyway, and a ten line version would include one or the other depending on which one might feel is the lessor of two evils.
It doesn't do any error checking, and it doesn't allow whitespace - assuming, again, that it's a one time program then it's faster to do search/replace and get rid of spaces and other whitespace before running the script, however it shouldn't need more than another few lines to eat whitespace as well.
There are, of course, ways to make it shorter but they would likely decrease readability significantly...
Hmph. Just read the comment about line length, so here's a newer version with an uglier hextonum macro, rather than the array:
It isn't horribly unreadable, but I know many people have issues with the ternary operator, but the appropriate naming of the macro and some analysis should readily yield how it works to the average C programmer. Due to side effects in the macro I had to move to a for loop so I didn't have to have another line for i+=2 (
hextonum(i++)
will increment i by 5 each time it's called, macro side effects are not for the faint of heart!).Also, the input parser should skip/ignore white space.
grumble, grumble, grumble.
I had to add a few lines to take care of this requirement, now up to 14 lines for a reasonably formatted version. It will ignore everything that's not a hexadecimal character:
I didn't bother with the 80 character line length because the input isn't even less than 80 characters, but a 3 level ternary macro could replace the first 256 entry array. If one didn't mind a bit of "alternative formatting" then the following 10 line version isn't completely unreadable:
And, again, further obfuscation and bit twiddling could result in an even shorter example.
45 byte executable (base64 encoded):
(paste into a file with a .com extension)
EDIT: Ok, here's the code. Open a Window's console, create a file with 45 bytes called 'hex.com', type "debug hex.com" then 'a' and enter. Copy and paste these lines:
Press enter, 'w' and then enter again, 'q' and enter. You can now run 'hex.com'
EDIT2: Made it two bytes smaller!
That was tricky. I can't believe I spent time doing that.
Late to the game, but here's some Python{2,3} one-liner (100 chars, needs
import sys, re
):I can't code this off the top of my head, but for every two characters, output (byte)((AsciiValueChar1-(AsciiValueChar1>64?48:55)*16)+(AsciiValueChar1-(AsciiValueChar1>64?48:55))) to get a hex string changed into raw binary. This would break horribly if your input string has anything other than 0 to 9 or A to B, so I can't say how useful it would be to you.