Answer a question

I have a disordered list named d that looks like:

[0.0000, 123.9877,0.0000,9870.9876, ...]

I just simply want to plot a cdf graph based on this list by using Matplotlib in Python. But don't know if there's any function I can use

d = []
d_sorted = []
for line in fd.readlines():
    (addr, videoid, userag, usertp, timeinterval) = line.split()
    d.append(float(timeinterval))

d_sorted = sorted(d)

class discrete_cdf:
    def __init__(data):
        self._data = data # must be sorted
        self._data_len = float(len(data))

    def __call__(point):
        return (len(self._data[:bisect_left(self._data, point)]) / 
               self._data_len)

cdf = discrete_cdf(d_sorted)
xvalues = range(0, max(d_sorted))
yvalues = [cdf(point) for point in xvalues]
plt.plot(xvalues, yvalues)

Now I am using this code, but the error message is :

Traceback (most recent call last):
File "hitratioparea_0117.py", line 43, in <module>
cdf = discrete_cdf(d_sorted)
TypeError: __init__() takes exactly 1 argument (2 given)

Answers

As mentioned, cumsum from numpy works well. Make sure that your data is a proper PDF (ie. sums to one), otherwise the CDF won't end at unity as it should. Here is a minimal working example:

import numpy as np
from pylab import *

# Create some test data
dx = 0.01
X  = np.arange(-2, 2, dx)
Y  = exp(-X ** 2)

# Normalize the data to a proper PDF
Y /= (dx * Y).sum()

# Compute the CDF
CY = np.cumsum(Y * dx)

# Plot both
plot(X, Y)
plot(X, CY, 'r--')

show()

enter image description here

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐