I have a string '1234567890' that I want split into groups of threes, starting from right to left, with the left most group ranging from one digit to 3-digits (depending on how many digits are left over)
Essentially, it's the same procedure as adding commas to a long number, except, I also want to extract the last three digits as well.
I tried using look-arounds but couldn't figure out a way to get the last three
digits.
string = '1234567890'
re.compile(r'\d{1,3}(?=(?:\d{3})+$)')
re.findall(pattern, string)
['1', '234', '567']
Expected output is (I don't need commas):
['1', '234', '567', 789]
Appreciate that if we add commas from right to left, for each group of three complete digits, then we can simply do a regex replace all of three digits with those three digits followed by a comma. In the code snippet below, I reverse the numbers string, do the comma work, then reverse again to arrive at the output we want.
string = '1234567890'
string = re.sub(r'(?=\d{4})(\d{3})', r'\1,', string[::-1])[::-1]
print string.split(',')
string = '123456789'
string = re.sub(r'(?=\d{4})(\d{3})', r'\1,', string[::-1])[::-1]
print string.split(',')
Output:
['1', '234', '567', '890']
['123', '456', '789']
One part of the regex used for replacement might warrant further explanation. I added a positive lookahead (?=\d{4})
to the start of the pattern. This is there to ensure that we don't add a comma after a final group of three digits, should that occur.
Demo here:
Rextester
It is actually easier to operate on a reversed string to keep track of groups of 3 digits where there are more digits to go (with the positive lookahead of (?=\d)
:
for s in ('123','1234','123456789','1234567890'):
print(re.sub(r'(\d\d\d)(?=\d)',r'\1,',s[::-1])[::-1])
Or a negative lookahead version:
for s in ('123','1234','123456789','1234567890'):
print(re.sub(r'(\d\d\d)(?!$)',r'\1,',s[::-1])[::-1])
Either prints:
123
1,234
123,456,789
1,234,567,890
Applying a reversed regex on a reversed string is called a sexeger in Perl ;-)
You can also do a lookahead version that does not require reversing the string:
for s in ('123','1234','123456789','1234567890'):
print(re.sub(r'(\d)(?=(\d{3})+$)',r'\1,',s))
# same output
Based on the comment, just add an appropriate delimiter and then .split
on that:
>>> for s in ('123','1234','123456789','1234567890'):
... re.sub(r'(\d)(?=(\d{3})+$)',r'\1\t',s).split('\t')
...
['123']
['1', '234']
['123', '456', '789']
['1', '234', '567', '890']
Or, skip the regex and just do it in Python:
for s in ('123','1234','123456789','1234567890'):
s=s[::-1]
n=3
print([s[i:i+n][::-1] for i in range(0,len(s),n)][::-1])
# same output