I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000
should appear as 10.000
and 10000,01
as 10.000,01
in the output.
I (and Google) come up with this function, but this fails for floats:
function commas(n) {
gsub(/,/,"",n)
point = index(n,".") - 1
if (point < 0) point = length(n)
while (point > 3) {
point -= 3
n = substr(n,1,point)"."substr(n,point + 1)
}
sub(/-\./,"-",n)
return d n
}
But it fails with floats.
Now I'm thinking of splitting the input to an integer and a < 1 part, then after formatting the integer gluing them again, but isn't there a better way to do it?
Disclaimer:
- I'm not a programmer
- I know that via some SHELL env. variables the thousand separators can be set, but it must be working in different environments with different lang and/or locale settings.
- English is my 2nd language, sorry if I'm using it incorrectly
To go with Pax's answer:
Read the "Conversion" section of the GNU awk manual which talks explicitly about the effect of your
LOCALE
environment variable on the string representation of numeric types.It fails with floats because you're passing in European type numbers (1.000.000,25 for a million and a quarter). The function you've given should work if you just change over commas and periods. I'd test the current version first with 1000000.25 to see if it works with non-European numbers.
The following awk script can be called with
"echo 1 | awk -f xx.gawk"
and it will show you both the "normal" and European version in action. It outputs:Obviously, you're only interested in the functions, real-world code would use the input stream to pass values to the functions, not a fixed string.
The functions are identical except in their handling of commas and periods. We'll call them separators and decimals in the following description: