Is this type of expression in C valid (on all compilers)?
If it is, is this good C?
char cloneString (char clone[], char string[], int Length)
{
if(!Length)
Length=64
;
int i = 0
;
while((clone[i++] = string[i]) != '\0', --Length)
;
clone[i] = '\0';
printf("cloneString = %s\n", clone);
};
Would this be better, worse, indifferent?
char *cloneString (char clone[], char string[], int Length)
{
if(!Length)
Length=STRING_LENGTH
;
char *r = clone
;
while
( //(clone[i++] = string[i]) != '\0'
*clone++ = *string++
, --Length
);
*clone = '\0';
return clone = r
;
printf("cloneString = %s\n", clone);
};
Stackoverflow wants me to add more text to this question!
Okay! I'm concerned about
a.) expressions such as c==(a=b)
b.) performance between indexing vs pointer
Any comments?
Thanks so much.
Yes, it's syntactically valid on all compilers (though semantically valid on none), and no, it isn't considered good C. Most developers will agree that the comma operator is a bad thing, and most developers will generally agree that a single line of code should do only one specific thing. The while
loop does a whole four and has undefined behavior:
- it increments
i
;
- it assigns
string[i]
to clone[i++]
; (undefined behavior: you should use i
only once in a statement that increments/decrements it)
- it checks that
string[i]
isn't 0 (but discards the result of the comparison);
- it decrements
Length
, and terminates the loop if Length == 0
after being decremented.
Not to mention that assuming that Length
is 64 if it wasn't provided is a terrible idea and leaves plenty of room for more undefined behavior that can easily be exploited to crash or hack the program.
I see that you wrote it yourself and that you're concerned about performance, and this is apparently the reason you're sticking everything together. Don't. Code made short by squeezing statements together isn't faster than code longer because the statements haven't been squeezed together. It still does the same number of things. In your case, you're introducing bugs by squeezing things together.
The code has Undefined Behavior:
The expression
(clone[i++] = string[i])
both modifies and accesses the object i
from two different subexpressions in an unsequenced way, which is not allowed. A compiler might use the old value of i
in string[i]
, or might use the new value of i
, or might do something entirely different and unexpected.
Ok so I decided to evolve my comments into an actual answer. Although this doesn’t address the specific piece of code in your question, it answers the underlying issue and I think you will find it illuminating as you can use this — let’s call it guide — on your general programming.
What I advocate, especially if you are just learning programming is to focus on readability instead of small gimmicks that you think or was told that improve speed / performance.
Let’s take a simple example. The idiomatic way to iterate through a vector in C
(not in C++
) is using indexing:
int i;
for (i = 0; i < size; ++i) {
v[i] = /* code */;
}
I was told when I started programming that v[i]
is actually computed as *(v + i)
so in generated assembler this is broken down (please note that this discussion is simplified):
- multiply
i
with sizeof(int)
- add that result to the address of
v
- access the element at this computed address
So basically you have 3 operations.
Let’s compare this with accessing via pointers:
int *p;
for (p = v; p != v + size; ++p) {
*p = /*..*/;
}
This has the advantage that *p
actually expands to just one instruction:
- access the element at the address
p
.
2 extra instructions don’t seam much but if your program spends most of it’s time in this loop (either extremely large size
or multiple calls to (the functions containing this) loop) you realise that the second version makes your program almost 3 times faster. That is a lot. So if you are like me when I started, you will choose the second variant. Don’t!
So the first version has readability (you explicitly describe that you access the i
-th element of vector v
), the second one uses a gimmick in detriment of readability (you say that you access a memory location). Now this might not be the best example for unreadable code, but the principle is valid.
So why do I tell you to use the first version: until you have a firm grasp on concepts like cache, branching, induction variables (and a lot more) and how they apply in real world compilers and programs performance, you should stay clear of these gimmicks and rely on the compiler to do the optimizations. They are very smart and will generate the same code for both variants (with optimization enabled of course). So the second variant actually differs just by readability and is identical performance-wise with the first.
Another example:
const char * str = "Some string"
int i;
// variant 1:
for (i = 0; i < strlen(str); ++i) {
// code
}
// variant 2:
int l = strlen(str);
for (i = 0; i < l; ++i) {
// code
}
The natural way would be to write the first variant. You might think that the second improves performance because you call the function strlen
on each iteration of the loop. And you know that getting the length of a string means iterating through all the string until you reach the end. So basically a call to strlen
means adding an inner loop. Ouch that has to slow the program down. Not necessarily: the compiler can optimize the call out because it always produces the same result. Actually you can do harm as you introduce a new variable which will have to be assigned a different register from a very limited registry pool (a little extreme example, but nevertheless a point is to be made here).
Don’t spend your energy on things like this until much later.
Let me show you something else that will illustrate further more that any assumptions that you make about performance will be most likely be false and misleading (I am not trying to tell you that you are a bad programmer — far from it — just that as you learn, you should invest your energy in something else than performance):
Let’s multiply two matrices:
for (k = 0; k < n; ++k) {
for (i = 0; i < n; ++i) {
for (j = 0; j < n; ++j) {
r[i][j] += a[i][k] * b[k][j];
}
}
}
versus
for (k = 0; k < n; ++k) {
for (j = 0; j < n; ++j) {
for (i = 0; i < n; ++i) {
r[i][j] += a[i][k] * b[k][j];
}
}
}
The only difference between the two is the order the operations get executed. They are the exact same operations (number, kind and operands), just in a different order. The result is equivalent (addition is commutative) so on paper they should take the EXACT amount of time to execute. In practice, even with optimizations enable (some very smart compilers can however reorder the loops) the second example can be up to 2-3 times slower than the first. And even the first variant is still a long long way from being optimal (in regards to speed).
So basic point: worry about UB as the other answers show you, don’t worry about performance at this stage.
The second block of code is better.
The line
printf("cloneString = %s\n", clone);
there will never get executed since there a return statement before that.
To make your code a bit more readable, change
while
(
*clone++ = *string++
, --Length
);
to
while ( Length > 0 )
{
*clone++ = *string++;
--Length;
}
This is probably a better approach to your problem:
#include <stdio.h>
#include <string.h>
void cloneString(char *clone, char *string)
{
for (int i = 0; i != strlen(string); i++)
clone[i] = string[i];
printf("Clone string: %s\n", clone);
}
That been said, there's already a standard function to to that:
strncpy(const char *dest, const char *source, int n)
dest is the destination string, and source is the string that must be copied. This function will copy a maximum of n characters.
So, your code will be:
#include <stdio.h>
#include <string.h>
void cloneString(char *clone, char *string)
{
strncpy(clone, string, strlen(string));
printf("Clone string: %s\n", clone);
}