I have this weird behaviour in a pandas Dataframe. I am using .apply(single_seats_comma) on a column with the following example content: (1,2)
. However, it seems to return it as range(1,3)
instead of a string (1,2)
. Other rows have more than 2 entries as well, e.g.
(30,31,32)
. I have a function which splits on ,
and converts each value in brackets into a new row however with (x,x)
it breaks.
def single_seats_comma(row):
strlist = str(row).split(',')
strlist = filter(None, strlist)
intlist = []
for el in strlist:
intlist.append(int(el))
return intlist
Example for 'apply':
tickets['seats'][:1].apply(single_seats_comma)
The Error output of the def is
ValueError: invalid literal for int() with base 10: 'range(1'
Trying to find a solution, I found this:
str(tickets['seats'][:1])
>>'0 (1, 2)\nName: seats, dtype: object'
tickets['seats'][:1].values
>> '[range(1, 3)]'
It works on a column if the values are just 1,2
.
Any help help is much appreciated!
Perhaps it would be easier to simply iterate over the elements of the row instead of converting to string then splitting. This is simple enough to use a lambda.
tickets['seats'][:1].apply(lambda row: [int(e) for e in row])
I cannot reproduce the range
string.
But this function should work for both cases:
def single_seats_comma(row):
if type(row) is tuple:
return list(row)
elif type(row) is range:
res = [row.start]
end = row.stop - 1
if end - row.start > 1:
res.append(end)
return res
Example:
>>> tickets = pd.DataFrame({'seats': [(100, 1022), range(3, 4), range(2, 10)]})
>>> tickets['seats'].apply(single_seats_comma)
0 [100, 1022]
1 [3]
2 [2, 9]
Name: seats, dtype: object
Thanks to all contributors to get me closer to a solution. The solution is actually quite simple.
The challenge was that pandas interpreted (1,2) as range and not as string However, the target was to create a list of all values, originally by splitting a string on ','. Not needed!
list(range(1,2)) does the job already. Here is the example and solution:
list(range(11, 17))
>> [11, 12, 13, 14, 15, 16]
tickets['seats'][0]
>> range(1, 3)
list(alltickets['seats'][0])
>> [1, 2]
So solution(s):
def single_seats_comma(row):
strlist = list(row)
return strlist
tickets['seats'].apply(single_seats_comma)
or
tickets['seats'].apply(lambda row: list(row))