I'm having a problem trying to get a character count column of the string values in another column, and haven't figured out how to do it efficiently.
for index in range(len(df)):
df['char_length'][index] = len(df['string'][index]))
This apparently involves first creating a column of nulls and then rewriting it, and it takes a really long time on my data set. So what's the most effective way of getting something like
'string' 'char_length'
abcd 4
abcde 5
I've checked around quite a bit, but I haven't been able to figure it out.
Here's one way to do it.
Pandas has a vectorised string method for this:
str.len()
. To create the new column you can write:For example:
This should be considerably faster than looping over the DataFrame with a Python
for
loop.Many other familiar string methods from Python have been introduced to Pandas. For example,
lower
(for converting to lowercase letters),count
for counting occurrences of a particular substring, andreplace
for swapping one substring with another.