I would like to see how str.split()
is implemented in Python Here's what I tried:
> inspect.getsource(str.split)
TypeError: <method 'split' of 'str' objects> is not a module,
class, method, function, traceback, frame, or code object
Copying the other example on StackOverflow has not work: Code for Greatest Common Divisor in Python
inspect.getsource(str.split)
is not written to handle code written in the implementation language (C
here). str.split
is builtin, i.e written in C
.
The source code for the implementation of str.split
is broken up in two parts based on if a sep
argument is supplied.
The first function, for when no sep
argument is supplied and split removes white space characters, is split_whitespace
. How it is implemented is pretty straight-forward; the main bulk is located in the while
loop that removes leading whitespace, searches the remaining string characters if any white space exists and splits on it. I've added some comments for illustrating this:
i = j = 0;
while (maxcount-- > 0) {
/* Increment counter past all leading whitespace in
the string. */
while (i < str_len && STRINGLIB_ISSPACE(str[i]))
i++;
/* if string only contains whitespace, break. */
if (i == str_len) break;
/* After leading white space, increment counter
while the character is not a whitespace.
If this ends before i == str_len, it points to
a white space character. */
j = i; i++;
while (i < str_len && !STRINGLIB_ISSPACE(str[i]))
i++;
#ifndef STRINGLIB_MUTABLE
/* Case where no split should be done, return the string. */
if (j == 0 && i == str_len && STRINGLIB_CHECK_EXACT(str_obj)) {
/* No whitespace in str_obj, so just use it as list[0] */
Py_INCREF(str_obj);
PyList_SET_ITEM(list, 0, (PyObject *)str_obj);
count++;
break;
}
#endif
/* Make the split based on the incremented counters. */
SPLIT_ADD(str, j, i);
}
Similarly, split_char
is the case where a character is supplied as sep
. Its implementation is again pretty straight-forward, examine it a bit after seeing split_whitespace
; you won't find it too difficult.
There's also the split
function for handling cases where the separator is more than one characters long. This is implemented by searching for the characters in the string and splitting accordingly.