So I have this dataframe:
filename width height class xmin ymin xmax ymax
0 128782.JPG 640 512 Panel 36 385 119 510
1 128782.JPG 640 512 Panel 124 388 207 510
2 128782.JPG 640 512 Panel 210 390 294 511
3 128782.JPG 640 512 Panel 294 395 380 510
4 128782.JPG 640 512 Panel 379 398 466 511
5 128782.JPG 640 512 Panel 465 402 553 510
6 128782.JPG 640 512 P+SD 552 402 638 510
7 128782.JPG 640 512 P+SD 558 264 638 404
...
...
57170 128782.JPG 640 512 P+SD 36 242 121 383
57171 128782.JPG 640 512 HS+P+SD 36 97 122 242
57172 128782.JPG 640 512 P+SD 214 106 304 250
Which contains in the column called "class" have the unique values "Panel", "P+SD" and "HS+P+SD". I want to count how many rows there are with these values so I tried this:
print(len(split_df[split_df["class"].str.contains('Panel')]))
print(len(split_df[split_df["class"].str.contains('HS+P+SD')]))
print(len(split_df[split_df["class"].str.contains('P+SD')]))
This gave me this output:
56988
0
0
This is incorrect as you can clearly see based on the snippet of the DataFrame provided above, why is everything counted properly for Panel but nothing is counted for the other two "class" names?
Here's the output of split_df.info:
RangeIndex: 57172 entries, 0 to 57171
Data columns (total 8 columns):
filename 57172 non-null object
width 57172 non-null int64
height 57172 non-null int64
class 57172 non-null object
xmin 57172 non-null int64
ymin 57172 non-null int64
xmax 57172 non-null int64
ymax 57172 non-null int64
dtypes: int64(6), object(2)
memory usage: 3.5+ MB
I cannot for the life of me figure out what is wrong. Any help is appreciated.
Also simple for loop with in will work
About the timing (if you want to check this link )
pd.Series.str.contains
hasregex=True
by default. Since+
is a special character in regex, useregex=False
,re.escape
, or\
escaping:If this is your core problem and you don't want a
'P+SD'
count to include'HS+P+SD'
, don't usestr.contains
. Check for equality instead and usevalue_counts
on the values you wish to count:Or for all counts just use
df['class'].value_counts()
.Try: