Python timeit command-line error: “SyntaxError: EO

2019-07-19 14:14发布

问题:

I have been using the Python timeit module for a long time, but it was only through an interactive Python session or a Unix shell. Now, I am trying to measure some code snippets in the Windows command prompt (cmd.exe), but it shows this error:

C:\Users\Me>python -m timeit '"-".join(map(str, range(100)))'
Traceback (most recent call last):
  File "C:\Python33\lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python33\lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File "C:\Python33\lib\timeit.py", line 334, in <module>
    sys.exit(main())
  File "C:\Python33\lib\timeit.py", line 298, in main
    t = Timer(stmt, setup, timer)
  File "C:\Python33\lib\timeit.py", line 131, in __init__
    code = compile(src, dummy_src_name, "exec")
  File "<timeit-src>", line 6
    '-.join(map(str,
                   ^
SyntaxError: EOL while scanning string literal

which is rather confusing, since I have not inserted any newline characters in the string - rather, I actually pasted the example directly from the timeit module documentation.

While playing around with this, I tried testing snippets without any spaces, since the error marked characters just before them. Even though no exception now occurs, the module reports the same execution time as if I had passed a pass statement, as shown here:

C:\Users\Me>python -m timeit
100000000 loops, best of 3: 0.013 usec per loop

C:\Users\Me>python -m timeit 'map(str,range(100))'
100000000 loops, best of 3: 0.013 usec per loop

C:\Users\Me>python -m timeit 'map(str,range(1000000000000000))'
100000000 loops, best of 3: 0.013 usec per loop

I am sure I call the module correctly since I have pasted the same lines on a Unix shell and they work as expected.

Since I get the exact same results with Python 2.7 and 3.3 (plus, the module is written in pure Python and it has been around for a long time) I am sure that this has nothing to do with Python, but the Windows command prompt, instead.

So, why does this weird behaviour happen exactly and how do I fix it?

回答1:

tl;dr

Use double quotes for the statement passed to the timeit module.
Example:

C:\Users\Me>python -m timeit "'-'.join(map(str, range(100)))"
10 loops, best of 3: 28.9 usec per loop

Detailed explanation

In contrast to Unix shells such as bash and tcsh, single quotes are treated differently on a Windows command line.

Here is a tiny python program to demonstrate this:

import sys
print(sys.argv[1:])

Running this (let's call the file cmdtest.py), we observe the following:

C:\Users\Me\Desktop>python cmdtest.py 1 2 3
['1', '2', '3']

C:\Users\Me\Desktop>python cmdtest.py "1 2 3"
['1 2 3']

C:\Users\Me\Desktop>python cmdtest.py '1 2 3'
["'1", '2', "3'"]

So, single quotes are treated literally (i.e. not as special characters). Searching a bit in SO, I found this great description of argument tokenization by cmd:

When invoking a command from a command window, tokenization of the command line arguments is not done by cmd.exe (a.k.a. "the shell"). Most often the tokenization is done by the newly formed processes' C/C++ runtime, but this is not necessarily so -- for example, if the new process was not written in C/C++, or if the new process chooses to ignore argv and process the raw commandline for itself (e.g. with [GetCommandLine()][1]). At the OS level, Windows passes command lines untokenized as a single string to new processes. This is in contrast to most *nix shells, where the shell tokenizes arguments in a consistent, predictable way before passing them to the newly formed process. All this means that you may experience wildly divergent argument tokenization behavior across different programs on Windows, as individual programs often take argument tokenization into their own hands.

If it sounds like anarchy, it kind of is. However, since a large number of Windows programs do utilize the Microsoft C/C++ runtime's argv, it may be generally useful to understand how the MSVCRT tokenizes arguments. Here is an excerpt:

  • Arguments are delimited by white space, which is either a space or a tab.
  • A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument. Note that the caret (^) is not recognized as an escape character or delimiter.

Error #2

Having the above in mind, let's explain the second weird behaviour first (the one that acts as a pass statement), as it is a bit simpler. Since single quotes are interpreted literally, when calling:

C:\Users\Me>python -m timeit 'map(str,range(100))'

the exact string literal 'map(str,range(100))' (with quotes included) is passed as the statement to time.
So, Python will see

"'map(str,range(100))'"

instead of

'map(str,range(100))'

which, as a string, doesn't really do anything and gives a measurement pretty close to a pass statement.


Error #1

Now for the first error:
As it is documented for the python timeit module:

A multi-line statement may be given by specifying each line as a separate statement argument;

So, when calling:

C:\Users\Me>python -m timeit '"-".join(map(str, range(100)))'

Python sees ["'-.join(map(str,", "range(100)))'"] passed as statements to timeit, which the module interprets as the multi-line statement:

'"-".join(map(str,
range(100)))'

This has as its first line a string that opens with a single quote, but never closes, thus, (finally) explaining the bizarre EOL error.


Solution

Using double quotes for the statement to time solves the problem.

I have also tried Windows PowerShell, which is more advanced than cmd.exe and exhibits similar behaviour with Unix shells, but didn't quite do the trick for all the statements that I tested.
For instance, this works (notice the space in the statement):

PS C:\Users\Me> python -m timeit 'map(str, range(100))'
1000000 loops, best of 3: 0.688 usec per loop

while the initial example doesn't:

PS C:\Users\Me\Desktop> python -m timeit '"-".join(map(str, range(100)))'
option -. not recognized
use -h/--help for command line help

(I am not yet really satisfied, though. What I would rather do is make cmd or PowerShell work as a Unix shell, so that I can simply paste and time code snippets. If anyone knows a quick-and-dirty way to do this (if it is even possible), in order to complete the answer, that would be awesome.)