This method searches for the first group of word characters (ie: [a-zA-Z0-9_]
), returning the first matched group or None
in case of failure.
def test(str):
m = re.search(r'(\w+)', str)
if m:
return m.group(1)
return None
The same function can be rewritten as:
def test2(str):
m = re.search(r'(\w+)', str)
return m and m.group(1)
This works the same, and is documented behavior; as this page clearly states:
The expression
x and y
first evaluatesx
; ifx
is false, its value is returned; otherwise,y
is evaluated and the resulting value is returned.
However, being a boolean operator (it even says so on the manual), I expected and
to return a boolean. As a result, I was astonished when I found out (how) this worked.
What are other use cases of this, and/or what is the rationale for this rather unintuitive implementation?
Conciseness (and therefore clarity, as soon as you get used to it, since after all it does not sacrifice readability at all!-) any time you need to check something and either use that something if it's true, or another value if that something is false (that's for
and
-- reverse it foror
-- and I'm very deliberately avoiding the actual keywords-or-the-likeTrue
andFalse
, since I'm talking about every object, not justbool
!-).Vertical space on any computer screen is limited, and, given the choice, it's best spent on useful readability aids (docstrings, comments, strategically placed empty lines to separate blocks, ...) than in turning, say, a line such as:
into six such as:
or more cramped versions thereof.
Far from being "unintuitive", beginners regularly were tripped up by the fact that some languages (like standard Pascal) did not specify the order of evaluation and the short-circuiting nature of
and
andor
; one of the differences between Turbo Pascal and the language standard, which back in the day made Turbo the most popular Pascal dialect of all times, was exactly that Turbo implementedand
andor
much like Python did later (and the C language did earlier...).Basically
a and b
returns the operand that has the same truth value as the whole expression.It might sound a bit confusing but just do it in your head: If
a
isFalse
, thenb
does not matter anymore (becauseFalse and anything
will always beFalse
), so it can returna
right away.But when
a
isTrue
then onlyb
matters, so it returnsb
right away without even looking.This is a very common and very basic optimization many languages do.
No.
"unintuitive"? Really? I'd disagree.
Let's think.
"a and b" is falsified if
a
is false. So the first false value is sufficient to know the answer. Why bother transforminga
to another boolean? It's already false. How much more false isFalse
? Equally false, right?So
a
's value -- when equivalent toFalse
-- is false enough, so that's the value of the entire expression. No further conversion or processing. Done.When
a
's value is equivalent toTrue
thenb
's value is all that's required. No further conversion or processing. Why transformb
to another boolean? It's value is all we need to know. If it's anything likeTrue
, then it's true enough. How much more true isTrue
?Why create spurious additional objects?
Same analysis for or.
Why Transform to Boolean? It's already true enough or false enough. How much more True can it get?
Try This.
While
(True and 0)
is actually0
, it's equal toFalse
. That's false-enough for all practical purposes.If it's a problem, then
bool(a and b)
will force the explicit conversion.I think that while this notation 'works' it represents a poor coding style that hides logic and will confuse more experienced programmers who will have the 'baggage' of knowledge how the majority of other languages work.
In most languages the return value of an active function is determined by the type of function. Unless its's been explicitly overloaded. Example a 'strlen' type function is expected to return an integer not a string.
In line functions such as the core arthritic and logic functions (+-/*|&!) are even more restrained because they also have history of formal math theory behind them. (Think about all the arguments about order of operations for these functions)
To have fundamental functions return anything but their most common data type (either logic or numeric) should be classified as purposeful obfuscation.
In just about every common language '&' or '&&' or 'AND' is a logic or Boolean function. Behind the scenes, optimization compilers might use short cutting logic like above in LOGIC FLOW but not DATA STRUCTURE Modification (any optimizing compiler that changed the value this way would have been considered broken), but if the value is expected to be used in a variable for further processing, it should be in the logic or boolean type because that's the 'formal' for these operators in the majority of circumstances.
I didn't find this surprising, and in fact expected it to work when I originally tried it.
While not all values are
bools
, note that in effect, all values are boolean--they represent a truth value. (In Python, abool
is--in effect--a value which only represents true or false.) The number0
isn't a bool, but it explicitly (in Python) has a boolean value of False.In other words, the boolean operator
and
doesn't always return abool
, but it always return a boolean value; one that represents true or false, even if it also has other information logically attached to it (eg. a string).Maybe this is retroactive justification; I'm not sure, but either way it seems natural for Python's boolean operators to behave as they do.
When to use it?
In your example, test2 feels clearer to me. I can tell what they both do equally: the construction in test2 doesn't make it any harder to understand. All else equal, the more concise code in test2 is--marginally--more quickly understood. That said, it's a trivial difference, and I don't prefer either enough that I'd jump to rewrite anything.
It can be similarly useful in other ways:
This could be rewritten differently, but this is clear, straightforward and concise.
Don't go overboard; "a() and b() or c()" as a substitute for "a()? b():c()" is dangerous and confusing, since you'll end up with c() if b() is false. If you're writing a terniary statement, use the terniary syntax, even though it's hideously ugly:
b() if a() else c()
.