I can create a multi-line string using this syntax:
string = str("Some chars "
"Some more chars")
This will produce the following string:
"Some chars Some more chars"
Is Python joining these two separate strings or is the editor/compiler treating them as a single string?
P.s: I just want to understand the internals. I know there are other ways to declare or create multi-line strings.
Read the reference manual, it's in there. Specifically:
(emphasis mine)
This is why:
is exactly the same as:
str("Some chars Some more chars")
.This action is performed wherever a string literal might appear, list initiliazations, function calls (as is the case with
str
above) et-cetera.The only caveat is when a string literal is not contained between one of the grouping delimiters
()
,{}
or[]
but, instead, spreads between two separate physical lines. In that case we can alternatively use the backslash character to join these lines and get the same result:Of course, concatenation of strings on the same physical line does not require the backslash. (
string = "Hello " "World"
is just fine)Python is, now when exactly does Python do this is where things get interesting.
From what I could gather (take this with a pinch of salt, I'm not a parsing expert), this happens when Python transforms the parse tree (
LL(1)
Parser) for a given expression to it's corresponding AST (Abstract Syntax Tree).You can get a view of the parsed tree via the
parser
module:This dumps a pretty big and confusing list that represents concrete syntax tree parsed from the expression in
expr
:The numbers correspond to either symbols or tokens in the parse tree and the mappings from symbol to grammar rule and token to constant are in
Lib/symbol.py
andLib/token.py
respectively.As you can see in the snipped version I added, you have two different entries corresponding to the two different
str
literals in the expression parsed.Next, we can view the output of the AST tree produced by the previous expression via the
ast
module provided in the Standard Library:The output is more user friendly in this case; you can see that the
args
for the function call is the single concatenated stringHello World
.In addition, I also stumbled upon a cool module that generates a visualization of the tree for
ast
nodes. Using it, the output of the expressionexpr
is visualized like this:Image cropped to show only the relevant part for the expression.
As you can see, in the terminal leaf node we have a single
str
object, the joined string for"Hello "
and"World"
, i.e"Hello World"
.If you are feeling brave enough, dig into the source, the source code for transforming expressions into a parse tree is located at
Parser/pgen.c
while the code transforming the parse tree into an Abstract Syntax Tree is inPython/ast.c
.This information is for
Python 3.5
and I'm pretty sure that unless you're using some really old version (< 2.5
) the functionality and locations should be similar.Additionally, if you are interested in the whole compilation step python follows, a good gentle intro is provided by one of the core contributors, Brett Cannon, in the video From Source to Code: How CPython's Compiler Works.