Is there a way to do this? I cannot seem an easy way to interface pandas series with plotting a CDF.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
This is the easiest way.
Image of cumulative histogram
I came here looking for a plot like this with bars and a CDF line:
It can be achieved like this:
If you want to remove the vertical line, then it's explained how to accomplish that here. Or you could just do:
I also saw an elegant solution here on how to do it with
seaborn
.In case you are also interested in the values, not just the plot.
This will always work (discrete and continuous distributions)
Alternative example with a sample drawn from a continuous distribution or you have a lot of individual values:
For continuous distributions only
Please note if it very reasonable to make the assumption that there is only one occurence of each value in the sample (typically encountered in the case of continuous distributions) then the
groupby()
+agg('count')
is not necessary (since the count is always 1).In this case, a percent rank can be used to get to the cdf directly.
Use your best judgment when taking this kind of shortcut! :)
I believe the functionality you're looking for is in the hist method of a Series object which wraps the hist() function in matplotlib
Here's the relevant documentation
For example
I found another solution in "pure" Pandas, that does not require specifying the number of bins to use in a histogram:
To me, this seemed like a simply way to do it: