Given the following python script:
# dedupe.py
import re
def dedupe_whitespace(s,spacechars='\t '):
"""Merge repeated whitespace characters.
Example:
>>> dedupe_whitespace(r"Green\t\tGround") # doctest: +REPORT_NDIFF
'Green\tGround'
"""
for w in spacechars:
s = re.sub(r"("+w+"+)", w, s)
return s
The function works as intended within the python interpreter:
$ python
>>> import dedupe
>>> dedupe.dedupe_whitespace('Purple\t\tHaze')
'Purple\tHaze'
>>> print dedupe.dedupe_whitespace('Blue\t\tSky')
Blue Sky
However, the doctest example fails because tab characters are converted to spaces before comparison to the result string:
>>> import doctest, dedupe
>>> doctest.testmod(dedupe)
gives
Failed example:
dedupe_whitespace(r"Green Ground") #doctest: +REPORT_NDIFF
Differences (ndiff with -expected +actual):
- 'Green Ground'
? -
+ 'Green Ground'
How can I encode tab characters in a doctest heredoc string so that a test result comparison is performed appropriately?
You must set the NORMALIZE_WHITESPACE.
Or, alternatively, capture the output and compare it to the expected value:From the
doctest
documentation section How are Docstring Examples Recognized?:Edit: My mistake, I understood the docs the other way around. Tabs are being expanded to 8 spaces at both the string argument passed to
dedupe_whitespace
and the string literal being compared on the next line, sooutput
contains:and is being compared to:
I can't find a way to overcome this limitation without writing your own
DocTestParser
or testing for deduplicated spaces instead of tabs.I've gotten this to work using literal string notation for the docstring:
I got it to work by escaping the tab character in the expected string:
instead of
This is basically YatharhROCK's answer, but a bit more explicit. You can use raw strings or double escaping. But why?
You need the string literal to contain valid Python code that, when interpreted, is the code you want to run/test. These both work:
The effect of using raw strings and the effect of double-escaping (escape the slash) both leaves in the string two characters, the slash and the n. This code is passed to the Python interpreter, which takes "slash then n" to mean "newline character" inside a string literal.
Use whichever you prefer.
TL;DR: Escape the backslash, i.e., use
\\n
or\\t
instead of\n
or\t
in your otherwise unmodified strings;You probably don't want to make your docstrings raw as then you won't be able to use any Python string escapes including those you might want to.
For a method that supports using normal escapes, just escape the backslash in the backslash-character escape so after Python interprets it, it leaves a literal backslash followed by the character which
doctest
can parse.It's the raw heredoc string notation (
r"""
) that did the trick: