可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I would like to get the time spent on the cell execution in addition to the original output from cell.
To this end, I tried %%timeit -r1 -n1
but it doesn't expose the variable defined within cell.
%%time
works for cell which only contains 1 statement.
In[1]: %%time
1
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 5.96 µs
Out[1]: 1
In[2]: %%time
# Notice there is no out result in this case.
x = 1
x
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs
What's the best way to do it?
Update
I have been using Execute Time in Nbextension for quite some time now. It is great.
回答1:
Use cell magic and this project on github by Phillip Cloud:
Load it by putting this at the top of your notebook or put it in your config file if you always want to load it by default:
%install_ext https://raw.github.com/cpcloud/ipython-autotime/master/autotime.py
%load_ext autotime
If loaded, every output of subsequent cell execution will include the time in min and sec it took to execute it.
回答2:
The only way I found to overcome this problem is by executing the last statement with print.
Do not forget that cell magic starts with %%
and line magic starts with %
.
%%time
clf = tree.DecisionTreeRegressor().fit(X_train, y_train)
res = clf.predict(X_test)
print(res)
回答3:
%time
and %timeit
now come part of ipython's built-in magic commands
回答4:
An easier way is to use ExecuteTime plugin in jupyter_contrib_nbextensions package.
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
回答5:
I simply added %%time
at the beginning of the cell and got the time. You may use the same on Jupyter Spark cluster/ Virtual environment using the same. Just add %%time
at the top of the cell and you will get the output. On spark cluster using Jupyter, I added to the top of the cell and I got output like below:-
[1] %%time
import pandas as pd
from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
import numpy as np
.... code ....
Output :-
CPU times: user 59.8 s, sys: 4.97 s, total: 1min 4s
Wall time: 1min 18s
回答6:
This is not exactly beautiful but without extra software
class timeit():
from datetime import datetime
def __enter__(self):
self.tic = self.datetime.now()
def __exit__(self, *args, **kwargs):
print('runtime: {}'.format(self.datetime.now() - self.tic))
Then you can run it like:
with timeit():
# your code, e.g.,
print(sum(range(int(1e7))))
% 49999995000000
% runtime: 0:00:00.338492
回答7:
Sometimes the formatting is different in a cell when using print(res)
, but jupyter/ipython comes with a display
. See an example of the formatting difference using pandas below.
%%time
import pandas as pd
from IPython.display import display
df = pd.DataFrame({"col0":{"a":0,"b":0}
,"col1":{"a":1,"b":1}
,"col2":{"a":2,"b":2}
})
#compare the following
print(df)
display(df)
The display
statement can preserve the formatting.
回答8:
you may also want to look in to python's profiling magic command %prun
which gives something like -
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
return total
then
%prun sum_of_lists(1000000)
will return
14 function calls in 0.714 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.599 0.120 0.599 0.120 <ipython-input-19>:4(<listcomp>)
5 0.064 0.013 0.064 0.013 {built-in method sum}
1 0.036 0.036 0.699 0.699 <ipython-input-19>:1(sum_of_lists)
1 0.014 0.014 0.714 0.714 <string>:1(<module>)
1 0.000 0.000 0.714 0.714 {built-in method exec}
I find it useful when working with large chunks of code.