可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This has always confused me. It seems like this would be nicer:
my_list = ["Hello", "world"]
print my_list.join("-")
# Produce: "Hello-world"
Than this:
my_list = ["Hello", "world"]
print "-".join(my_list)
# Produce: "Hello-world"
Is there a specific reason it is like this?
回答1:
It's because any iterable can be joined, not just lists, but the result and the "joiner" are always strings.
E.G:
import urllib2
print '\n############\n'.join(
urllib2.urlopen('http://data.stackexchange.com/users/7095'))
回答2:
Because the join()
method is in the string class, instead of the list class?
I agree it looks funny.
See http://www.faqs.org/docs/diveintopython/odbchelper_join.html:
Historical note. When I first learned
Python, I expected join to be a method
of a list, which would take the
delimiter as an argument. Lots of
people feel the same way, and there’s
a story behind the join method. Prior
to Python 1.6, strings didn’t have all
these useful methods. There was a
separate string module which contained
all the string functions; each
function took a string as its first
argument. The functions were deemed
important enough to put onto the
strings themselves, which made sense
for functions like lower, upper, and
split. But many hard-core Python
programmers objected to the new join
method, arguing that it should be a
method of the list instead, or that it
shouldn’t move at all but simply stay
a part of the old string module (which
still has lots of useful stuff in it).
I use the new join method exclusively,
but you will see code written either
way, and if it really bothers you, you
can use the old string.join function
instead.
--- Mark Pilgrim, Dive into Python
回答3:
This was discussed in the String methods... finally thread in the Python-Dev achive, and was accepted by Guido. This thread began in Jun 1999, and str.join
was included in Python 1.6 which was released in Sep 2000 (and supported Unicode). Python 2.0 (supported str
methods including join
) was released in Oct 2000.
- There were four options proposed in this thread:
str.join(seq)
seq.join(str)
seq.reduce(str)
join
as a built-in function
- Guido wanted to support not only
list
s, tuple
s, but all sequences/iterables.
seq.reduce(str)
is difficult for new-comers.
seq.join(str)
introduces unexpected dependency from sequences to str/unicode.
join()
as a built-in function would support only specific data types. So using a built in namespace is not good. If join()
supports many datatypes, creating optimized implementation would be difficult, if implemented using the __add__
method then it's O(n²).
- The separater string (
sep
) should not be omitted. Explicit is better than implicit.
There are no other reasons offered in this thread.
Here are some additional thoughts (my own, and my friend's):
- Unicode support was coming, but it was not final. At that time UTF-8 was the most likely about to replace UCS2/4. To calculate total buffer length of UTF-8 strings it needs to know character coding rule.
- At that time, Python had already decided on a common sequence interface rule where a user could create a sequence-like (iterable) class. But Python didn't support extending built-in types until 2.2. At that time it was difficult to provide basic iterable class (which is mentioned in another comment).
Guido's decision is recorded in a historical mail, deciding on str.join(seq)
:
Funny, but it does seem right! Barry, go for it...
--Guido van Rossum
回答4:
I agree that it's counterintuitive at first, but there's a good reason. Join can't be a method of a list because:
- it must work for different iterables too (tuples, generators, etc.)
- it must have different behavior between different types of strings.
There are actually two join methods (Python 3.0):
>>> b"".join
<built-in method join of bytes object at 0x00A46800>
>>> "".join
<built-in method join of str object at 0x00A28D40>
If join was a method of a list, then it would have to inspect its arguments to decide which one of them to call. And you can't join byte and str together, so the way they have it now makes sense.
回答5:
Why is it string.join(list)
instead of list.join(string)
?
This is because join
is a "string" method! It creates a string from any iterable. If we stuck the method on lists, what about when we have iterables that aren't lists?
What if you have a tuple of strings? If this were a list
method, you would have to cast every such iterator of strings as a list
before you could join the elements into a single string! For example:
some_strings = ('foo', 'bar', 'baz')
Let's roll our own list join method:
class OurList(list):
def join(self, s):
return s.join(self)
And to use it, note that we have to first create a list from each iterable to join the strings in that iterable, wasting both memory and processing power:
>>> l = OurList(some_strings) # step 1, create our list
>>> l.join(', ') # step 2, use our list join method!
'foo, bar, baz'
So we see we have to add an extra step to use our list method, instead of just using the builtin string method:
>>> ' | '.join(some_strings) # a single step!
'foo | bar | baz'
Performance Caveat for Generators
The algorithm Python uses to create the final string with str.join
actually has to pass over the iterable twice, so if you provide it a generator expression, it has to materialize it into a list first before it can create the final string.
Thus, while passing around generators is usually better than list comprehensions, str.join
is an exception:
>>> import timeit
>>> min(timeit.repeat(lambda: ''.join(str(i) for i in range(10) if i)))
3.839168446022086
>>> min(timeit.repeat(lambda: ''.join([str(i) for i in range(10) if i])))
3.339879313018173
Nevertheless, the str.join
operation is still semantically a "string" operation, so it still makes sense to have it on the str
object than on miscellaneous iterables.
回答6:
Think of it as the natural orthogonal operation to split.
I understand why it is applicable to anything iterable and so can't easily be implemented just on list.
For readability, I'd like to see it in the language but I don't think that is actually feasible - if iterability were an interface then it could be added to the interface but it is just a convention and so there's no central way to add it to the set of things which are iterable.
回答7:
Primarily because the result of a someString.join()
is a string.
The sequence (list or tuple or whatever) doesn't appear in the result, just a string. Because the result is a string, it makes sense as a method of a string.
回答8:
-
in "-".join(my_list) declares that you are converting to a string from joining elements a list.It's result-oriented.(just for easy memory and understanding)
I make a exhaustive cheatsheet of methods_of_string for your reference.
string_methonds_44 = {
'convert': ['join','split', 'rsplit','splitlines', 'partition', 'rpartition'],
'edit': ['replace', 'lstrip', 'rstrip', 'strip'],
'search': ['endswith', 'startswith', 'count', 'index', 'find','rindex', 'rfind',],
'condition': ['isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isnumeric','isidentifier',
'islower','istitle', 'isupper','isprintable', 'isspace', ],
'text': ['lower', 'upper', 'capitalize', 'title', 'swapcase',
'center', 'ljust', 'rjust', 'zfill', 'expandtabs','casefold'],
'encode': ['translate', 'maketrans', 'encode'],
'format': ['format', 'format_map']}
回答9:
Both are not nice.
string.join(xs, delimit) means that the string module is aware of the existence of a list, which it has no business knowing about, since the string module only works with strings.
list.join(delimit) is a bit nicer because we're so used to strings being a fundamental type(and lingually speaking, they are). However this means that join needs to be dispatched dynamically because in the arbitrary context of a.split("\n")
the python compiler might not know what a is, and will need to look it up(analogously to vtable lookup), which is expensive if you do it a lot of times.
if the python runtime compiler knows that list is a built in module, it can skip the dynamic lookup and encode the intent into the bytecode directly, whereas otherwise it needs to dynamically resolve "join" of "a", which may be up several layers of inheritence per call(since between calls, the meaning of join may have changed, because python is a dynamic language).
sadly, this is the ultimate flaw of abstraction; no matter what abstraction you choose, your abstraction will only make sense in the context of the problem you're trying to solve, and as such you can never have a consistent abstraction that doesn't become inconsistent with underlying ideologies as you start gluing them together without wrapping them in a view that is consistent with your ideology. Knowing this, python's approach is more flexible since it's cheaper, it's up to you to pay more to make it look "nicer", either by making your own wrapper, or your own preprocessor.