I am trying to plot a simple Distplot
using pandas
and seaborn
to understand the density of the datasets.
Input
#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89
The dataset has above 10K
rows, no headers
and I am trying to use col[1]
code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('keyword.csv', delimiter=',', header=None, usecols=[1])
#print df
sns.distplot(df)
plt.show()
No error as I can print the input column but the distplot
is taking ages to compute and freezes my screen. Any suggestion to speed the process.
Edit1: As Suggested in the Comment Below I try to change from pandas.read_csv
to np.loadtxt
and now I get an error.
Code:
import numpy as np
from numpy import log as log
import matplotlib.pyplot as plt
import seaborn as sns
import pandas
df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
sns.kdeplot(df)
sns.distplot(df)
plt.show()
Error:
Traceback (most recent call last):
File "0_distplot_csv.py", line 7, in <module>
df = np.loadtxt('keyword.csv', delimiter=',', usecols=(1), unpack=True)
File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 726, in loadtxt
usecols = list(usecols)
TypeError: 'int' object is not iterable
Edit 2: I did try the mentioned suggestions from the comment section
sns.distplot(df[1])
This does the same as mentioned initially. The screen is frozen for ages.
sns.distplot(df[1].values)
I see a strange behavior in this case.
When the input is
Car,45
photo,4
movie,6
life,1
Horse,14
Pets,20
run,67
picture,89
It does plot but when the input is below
#Car,45
#photo,4
#movie,6
#life,1
#Horse,14
#Pets,20
#run,67
#picture,89
It is again the same freezing entire screen and would do nothing.
I did try to put comments=None
thinking it might be reading them as comments. But looks like comments
isn't used in pandas
.
Thank you