I agree with what Clifford said, that you shouldn't worry about optimizing it if you don't have to, and that you can push the log cleanup to your analysis platform, rather than worrying about formatting in an embedded application.
That being said, here's an article that might be useful to you. It uses a loop, shifts, additions and branches, with linear/constant complexity: http://www.johnloomis.org/ece314/notes/devices/binary_to_BCD/bin_to_bcd.html
Also, I thought it would be fun to make some code that doesn't perform any divides, multiplies, or branches, but still gives the correct answer [0 - 1024). No promises that this is any faster than other options. This sort of code is just an option to explore.
I'd love to see if anyone can provide some tricks to make the code smaller, require less memory, or require fewer operations, while keeping the rest of the counts equal, or shrinking them :)
Stats:
- 224 bytes in constants (no idea on the code size)
- 5 bit-shift-rights
- 3 subtracts
- 5 bitwise-ands
- 4 bitwise-ors
- 1 greater-than comparison
Perf:
Using the perf comparisons and itoa routines in Jonathan Leffler's answer, here are the stats I got:
- Division 2.15
- Subtraction 4.87
- My solution 1.56
- Brute force lookup 0.36
I increased the iteration count to 200000 to ensure I didn't have any problems with timing resolution, and had to add volatile
to the function signatures so that the compiler didn't optimize out the loop. I used VS2010 express w/ vanilla "release" settings, on a 3ghz dual core 64 bit Windows 7 machine (tho it compiled to 32 bit).
The code:
#include "stdlib.h"
#include "stdio.h"
#include "assert.h"
void itoa_ten_bits(int n, char s[])
{
static const short thousands_digit_subtract_map[2] =
{
0, 1000,
};
static const char hundreds_digit_map[128] =
{
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
0, 0, 0,
};
static const short hundreds_digit_subtract_map[10] =
{
0, 100, 200, 300, 400, 500, 600, 700, 800, 900,
};
static const char tens_digit_map[12] =
{
0, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 9,
};
static const char ones_digit_map[44] =
{
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0, 1, 2, 3
};
/* Compiler should optimize out appX constants, % operations, and + operations */
/* If not, use this:
static const char ones_digit_append_map[16] =
{
0, 6, 2, 8, 4, 10, 6, 12, 8, 14, 10, 16, 12, 18, 14, 20,
};
*/
static const char a1 = 0x10 % 10, a2 = 0x20 % 10, a3 = 0x40 % 10, a4 = 0x80 % 10;
static const char ones_digit_append_map[16] =
{
0, a1, a2, a1 + a2,
a3, a1 + a3, a2 + a3, a1 + a2 + a3,
a4, a1 + a4, a2 + a4, a1 + a2 + a4,
a3 + a4, a1 + a3 + a4, a2 + a3 + a4, a1 + a2 + a3 + a4,
};
char thousands_digit, hundreds_digit, tens_digit, ones_digit;
assert(n >= 0 && n < 1024 && "n must be between [0, 1024)");
/* n &= 0x3ff; can use this instead of the assert */
thousands_digit = (n >> 3 & 0x7f) > 0x7c;
n -= thousands_digit_subtract_map[thousands_digit];
ones_digit = ones_digit_map[
(n & 0xf)
+ ones_digit_append_map[n >> 4 & 0xf]
+ ones_digit_append_map[n >> 8 & 0x3]
];
n -= ones_digit;
hundreds_digit = hundreds_digit_map[n >> 3 & 0x7f];
n -= hundreds_digit_subtract_map[hundreds_digit];
tens_digit = tens_digit_map[n >> 3];
s[0] = '0' | thousands_digit;
s[1] = '0' | hundreds_digit;
s[2] = '0' | tens_digit;
s[3] = '0' | ones_digit;
s[4] = '\0';
}
int main(int argc, char* argv)
{
int i;
for(i = 0; i < 1024; ++i)
{
char blah[5];
itoa_ten_bits(i, blah);
if(atoi(blah) != i)
printf("failed %d %s\n", i, blah);
}
}
If the values are correctly in range (0..1023), then your last conversion is unnecessarily wasteful on the divisions; the last line could be replaced with:
temp[3] = 1023 / 1000;
or even:
temp[3] = 1023 >= 1000;
Since division is repeated subtraction, but you have a very special case (not a general case) division to deal with, I'd be tempted to compare the timings for the following code with the division version. I note that you put the digits into the string in 'reverse order' - the least significant digit goes in temp[0]
and the most in temp[4]
. Also, there is no chance of null-terminating the string given the storage. This code uses a table of 8 bytes of static data - considerably less than many of the other solutions.
void convert_to_ascii(int value, char *temp)
{
static const short subtractors[] = { 1000, 100, 10, 1 };
int i;
for (i = 0; i < 4; i++)
{
int n = 0;
while (value >= subtractors[i])
{
n++;
value -= subtractors[i];
}
temp[3-i] = n + '0';
}
}
Performance testing - Intel x86_64 Core 2 Duo 3.06 GHz (MacOS X 10.6.4)
This platform is probably not representative of your microcontroller, but the test shows that on this platform, the subtraction is considerably slower than the division.
void convert_by_division(int value, char *temp)
{
temp[0] = (value % 10) + '0';
temp[1] = (value % 100) / 10 + '0';
temp[2] = (value % 1000) / 100 + '0';
temp[3] = (value % 10000) / 1000 + '0';
}
void convert_by_subtraction(int value, char *temp)
{
static const short subtractors[] = { 1000, 100, 10, 1 };
int i;
for (i = 0; i < 4; i++)
{
int n = 0;
while (value >= subtractors[i])
{
n++;
value -= subtractors[i];
}
temp[3-i] = n + '0';
}
}
#include <stdio.h>
#include <timer.h>
#include <string.h>
static void time_convertor(const char *tag, void (*function)(void))
{
int r;
Clock ck;
char buffer[32];
clk_init(&ck);
clk_start(&ck);
for (r = 0; r < 10000; r++)
(*function)();
clk_stop(&ck);
printf("%s: %12s\n", tag, clk_elapsed_us(&ck, buffer, sizeof(buffer)));
}
static void using_subtraction(void)
{
int i;
for (i = 0; i < 1024; i++)
{
char temp1[4];
convert_by_subtraction(i, temp1);
}
}
static void using_division(void)
{
int i;
for (i = 0; i < 1024; i++)
{
char temp1[4];
convert_by_division(i, temp1);
}
}
int main()
{
int i;
for (i = 0; i < 1024; i++)
{
char temp1[4];
char temp2[4];
convert_by_subtraction(i, temp1);
convert_by_division(i, temp2);
if (memcmp(temp1, temp2, 4) != 0)
printf("!!DIFFERENCE!! ");
printf("%4d: %.4s %.4s\n", i, temp1, temp2);
}
time_convertor("Using division ", using_division);
time_convertor("Using subtraction", using_subtraction);
time_convertor("Using division ", using_division);
time_convertor("Using subtraction", using_subtraction);
time_convertor("Using division ", using_division);
time_convertor("Using subtraction", using_subtraction);
time_convertor("Using division ", using_division);
time_convertor("Using subtraction", using_subtraction);
return 0;
}
Compiling with GCC 4.5.1, and working in 32-bit, the average timings were (optimization '-O
'):
0.13
seconds using division
0.65
seconds using subtraction
Compiling and working in 64-bit, the average timings were:
0.13
seconds using division
0.48
seconds using subtraction
Clearly, on this machine, using subtraction is not a winning proposition. You would have to measure on your machine to make a decision. And removing the modulo 10000 operation will only skew results in favour of the division (it knocks about 0.02 seconds off the time with division when replaced with the comparison; that's a 15% saving and worth having).