Convert a string into Morse code [closed]

2019-01-30 07:46发布

The challenge

The shortest code by character count, that will input a string using only alphabetical characters (upper and lower case), numbers, commas, periods and question mark, and returns a representation of the string in Morse code. The Morse code output should consist of a dash (-, ASCII 0x2D) for a long beep (AKA 'dah') and a dot (., ASCII 0x2E) for short beep (AKA 'dit').

Each letter should be separated by a space (' ', ASCII 0x20), and each word should be separated by a forward slash (/, ASCII 0x2F).

Morse code table:

alt text http://liranuna.com/junk/morse.gif

Test cases:

Input:
    Hello world

Output:
    .... . .-.. .-.. --- / .-- --- .-. .-.. -..

Input:
    Hello, Stackoverflow.

Output:
    .... . .-.. .-.. --- --..-- / ... - .- -.-. -.- --- ...- . .-. ..-. .-.. --- .-- .-.-.-

Code count includes input/output (that is, the full program).

30条回答
Animai°情兽
2楼-- · 2019-01-30 08:07

Perl, 206 characters, using dmckee's idea

This is longer than the first one I submitted, but I still think it's interesting. And/or awful. I'm not sure yet. This makes use of dmckee's coding idea, plus a couple other good ideas that I saw around. Initially I thought that the "length/offset in a fixed string" thing couldn't come out to less data than the scheme in my other solution, which uses a fixed two bytes per char (and all printable bytes, at that). I did in fact manage to get the data down to considerably less (one byte per char, plus four bytes to store the 26-bit pattern we're indexing into) but the code to get it out again is longer, despite my best efforts to golf it. (Less complex, IMO, but longer anyway).

Anyway, 206 characters; newlines are removable except the first.

#!perl -lp
($a,@b)=unpack"b32C*",
"\264\202\317\0\31SF1\2I.T\33N/G\27\308XE0=\x002V7HMRfermlkjihgx\207\205";
$a=~y/01/-./;@m{A..Z,0..9,qw(. , ?)}=map{substr$a,$_%23,1+$_/23}@b;
$_=join' ',map$m{uc$_}||"/",/./g

Explanation:

  • There are two parts to the data. The first four bytes ("\264\202\317\0") represent 32 bits of morse code ("--.-..-.-.-----.....--..--------") although only the first 26 bits are used. This is the "reference string".
  • The remainder of the data string stores the starting position and length of substrings of the reference string that represent each character -- one byte per character, in the order (A, B, ... Z, 0, 1, ... 9, ".", ",", "?"). The values are coded as 23 * (length - 1) + pos, and the decoder reverses that. The last starting pos is of course 22.
  • So the unpack does half the work of extracting the data and the third line (as viewed here) does the rest, now we have a hash with $m{'a'} = '.-' et cetera, so all there is left is to match characters of the input, look them up in the hash, and format the output, which the last line does... with some help from the shebang, which tells perl to remove the newline on input, put lines of input in $_, and when the code completes running, write $_ back to output with newlines added again.
查看更多
狗以群分
3楼-- · 2019-01-30 08:07

Bash, a script I wrote a while ago (time-stamp says last year) weighing in at a hefty 1661 characters. Just for fun really :)

#!/bin/sh
txt=''
res=''
if [ "$1" == '' ]; then
    read -se txt
else
    txt="$1"
fi;
len=$(echo "$txt" | wc -c)
k=1
while [ "$k" -lt "$len" ]; do
    case "$(expr substr "$txt" $k 1 | tr '[:upper:]' '[:lower:]')" in
        'e')    res="$res"'.' ;;
        't')    res="$res"'-' ;;
        'i')    res="$res"'..' ;;
        'a')    res="$res"'.-' ;;
        'n')    res="$res"'-.' ;;
        'm')    res="$res"'--' ;;
        's')    res="$res"'...' ;;
        'u')    res="$res"'..-' ;;
        'r')    res="$res"'.-.' ;;
        'w')    res="$res"'.--' ;;
        'd')    res="$res"'-..' ;;
        'k')    res="$res"'-.-' ;;
        'g')    res="$res"'--.' ;;
        'o')    res="$res"'---' ;;
        'h')    res="$res"'....' ;;
        'v')    res="$res"'...-' ;;
        'f')    res="$res"'..-.' ;;
        'l')    res="$res"'.-..' ;;
        'p')    res="$res"'.--.' ;;
        'j')    res="$res"'.---' ;;
        'b')    res="$res"'-...' ;;
        'x')    res="$res"'-..-' ;;
        'c')    res="$res"'-.-.' ;;
        'y')    res="$res"'-.--' ;;
        'z')    res="$res"'--..' ;;
        'q')    res="$res"'--.-' ;;
        '5')    res="$res"'.....' ;;
        '4')    res="$res"'....-' ;;
        '3')    res="$res"'...--' ;;
        '2')    res="$res"'..---' ;;
        '1')    res="$res"'.----' ;;
        '6')    res="$res"'-....' ;;
        '7')    res="$res"'--...' ;;
        '8')    res="$res"'---..' ;;
        '9')    res="$res"'----.' ;;
        '0')    res="$res"'-----' ;;
    esac;
    [ ! "$(expr substr "$txt" $k 1)" == " " ] && [ ! "$(expr substr "$txt" $(($k+1)) 1)" == ' ' ] && res="$res"' '
    k=$(($k+1))
done;
echo "$res"
查看更多
forever°为你锁心
4楼-- · 2019-01-30 08:07

C89 (388 characters)

This is incomplete as it doesn't handle comma, fullstop, and query yet.

#define P putchar
char q[10],Q,tree[]=
"EISH54V 3UF    2ARL   + WP  J 1TNDB6=X/ KC  Y  MGZ7 Q  O 8  90";s2;e(x){q[Q++]
=x;}p(){for(;Q--;putchar(q[Q]));Q=0;}T(int x,char*t,int s){s2=s/2;return s?*t-x
?t[s2]-x?T(x,++t+s2,--s/2)?e(45):T(x,t,--s/2)?e(46):0:e(45):e(46):0;}main(c){
while((c=getchar())>=0){c-=c<123&&c>96?32:0;if(c==10)P(10);if(c==32)P(47);else
T(c,tree,sizeof(tree)),p();P(' ');}}

Wrapped for readability. Only two of the linebreaks are required (one for the #define, one after else, which could be a space). I've added a few non-standard characters but didn't add non-7-bit ones.

查看更多
爷、活的狠高调
5楼-- · 2019-01-30 08:07

I was dorking around with a compact coding for the symbols, but I don't see if getting any better than the implicit trees already in use, so I present the coding here in case some one else can use it.

Consider the string:

 --..--..-.-.-..--...----.....-----.--/

which contains all the needed sequences as substrings. We could code the symbols by offset and length like this:

       ET  RRRIIGGGJJJJ    
--..--..-.-.-..--...----.....-----.--/
          CCCC  DD WWW       00000
,,,,,,   AALLLL BBBB        11111
--..--..-.-.-..--...----.....-----.--/
  ??????  KKK  MMSSS       22222   
        FFFF  PPPP        33333
--..--..-.-.-..--...----.....-----.--/
        UUU XXXX         44444       
          NN  PPPP  OOO 55555
--..--..-.-.-..--...----.....-----.--/
               ZZZZ    66666
                      77777      YYYY
--..--..-.-.-..--...----.....-----.--/
       ......        88888 HHHH
                    99999 VVVV  QQQQ
--..--..-.-.-..--...----.....-----.--/

with the space (i.e. word boundary) starting and ending on the final character (the '/'). Feel free to use it, if you see a good way.

Most of the shorter symbols have several possible codings, of course.


P Daddy found a shorter version of this trick (and I can now see at least some of the redundancy here) and did a nice c implementation. Alec did a python implementation with the first (buggy and incomplete) version. Hobbs did a pretty compact perl version that I don't understand at all.

查看更多
Root(大扎)
6楼-- · 2019-01-30 08:10

Here's another approach, based on dmckee's work, demonstrating just how readable Python is:

Python

244 characters

def h(l):p=2*ord(l.upper())-88;a,n=map(ord,"AF__GF__]E\\E[EZEYEXEWEVEUETE__________CF__IBPDJDPBGAHDPC[DNBSDJCKDOBJBTCND`DKCQCHAHCZDSCLD??OD"[p:p+2]);return "--..--..-.-.-..--...----.....-----.-"[a-64:a+n-128]
def e(s):return ' '.join(map(h,s))

Limitations:

  • dmckee's string missed the 'Y' character, and I was too lazy to add it. I think you'd just have to change the "??" part, and add a "-" at the end of the second string literal
  • it doesn't put '/' between words; again, lazy

Since the rules called for fewest characters, not fewest bytes, you could make at least one of my lookup tables smaller (by half) if you were willing to go outside the printable ASCII characters.

EDIT: If I use naïvely-chosen Unicode chars but just keep them in escaped ASCII in the source file, it still gets a tad shorter because the decoder is simpler:

Python

240 characters

def h(l):a,n=divmod(ord(u'\x06_7_\xd0\xc9\xc2\xbb\xb4\xad\xa6\x9f\x98\x91_____\x14_AtJr2<s\xc1d\x89IQdH\x8ff\xe4Pz9;\xba\x88X_f'[ord(l.upper())-44]),7);return "--..--..-.-.-..--...----.....-----.-"[a:a+n]
def e(s):return ' '.join(map(h,s))

I think it also makes the intent of the program much clearer.

If you saved this as UTF-8, I believe the program would be down to 185 characters, making it the shortest complete Python solution, and second only to Perl. :-)

查看更多
混吃等死
7楼-- · 2019-01-30 08:11

Perl, 170 characters (with a little help from accomplished golfer mauke). Wrapped for clarity; all newlines are removable.

$_=uc<>;y,. ,|/,;s/./$& /g;@m{A..Z,0..9,qw(| , ?)}=
".-NINNN..]IN-NII..AMN-AI---.M-ANMAA.I.-].AIAA-NANMMIOMAOUMSMSAH.B.MSOIONARZMIZ"
=~/../g;1while s![]\w|,?]!$m{$&}!;print

Explanation:

  1. Extract the morse dictionary. Each symbol is defined in terms of two chars, which can be either literal dots or dashes, or a reference to the value of another defined char. E and T contain dummy chars to avoid desyncing the decoder; we'll remove them later.
  2. Read and format the input. "Hello world" becomes "H E L L O / W O R L D"
  3. The next step depends on the input and output dictionaries being distinct, so turn dots in the input to an unused char (vertical bar, |)
  4. Replace any char in the input that occurs in the morse dictionary with its value in the dictionary, until no replacements occur.
  5. Remove the dummy char mentioned in step 1.
  6. Print the output.

In the final version, the dictionary is optimized for runtime efficiency:

  • All one-symbol characters (E and T) and two-symbol characters (A, I, M, and N) are defined directly and decode in one pass.
  • All three-symbol characters are defined in terms of a two-symbol character and a literal symbol, decoding in two passes.
  • All four-symbol characters are defined in terms of two two-symbol characters, decoding in two passes with three replacements.
  • The five- and six-symbol characters (numbers and punctuation) decode in three passes, with four or five replacements respectively.

Since the golfed code only replaces one character per loop (to save one character of code!) the number of loops is limited to five times the length of the input (three times the length of the input if only alphabetics are used). But by adding a g to the s/// operation, the number of loops is limited to three (two if only alphabetics are used).

Example transformation:

Hello 123
H E L L O / 1 2 3
II .] AI AI M- / AO UM SM
.... . .-.. .-.. --- / .-M- .A-- I.--
.... . .-.. .-.. --- / .---- ..--- ...--
查看更多
登录 后发表回答