COUNTIF in pandas python over multiple columns wit

I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:

Person_ID  Age  Smoker  Diabetes
      001   30       Y         N
      002   45       N         N
      003   27       N         Y
      004   18       Y         Y
      005   55       Y         Y

Each attribute (Age, Smoker, Diabetes) has its own condition to determine whether it is a risk factor. So if Age >= 45, it's a risk factor. Smoker and Diabetes are risk factors if they are "Y". What I would like is to add a column that adds up the number of risk factors for each person based on those conditions. So the data would look like this:

Person_ID  Age  Smoker  Diabetes  Risk_Factors
      001   30       Y         N             1
      002   25       N         N             0
      003   27       N         Y             1
      004   18       Y         Y             2
      005   55       Y         Y             3

I have a sample dataset that I was fooling around with in Excel, and the way I did it there was to use the COUNTIF formula like so:

=COUNTIF(B2,">45") + COUNTIF(C2,"=Y") + COUNTIF(D2,"=Y")

However, the actual dataset that I will be using is way too large for Excel, so I'm learning pandas for python. I wish I could provide examples of what I've already tried, but frankly I don't even know where to start. I looked at this question, but it doesn't really address what to do about applying it to an entire new column using different conditions from multiple columns. Any suggestions?

标签： python pandas dataset

3条回答

家丑人穷心不美

2楼-- · 2020-02-10 06:27

If you are starting from excel and want to go to the next evolution then I would recommend MS access. It will be a lot easier then learning Panda for python. You should just replace the CountIf() with:

Risk Factor: IIF(Age>45, 1, 0) + IIF(Smoker="Y", 1, 0) + IIF(Diabetes="Y", 1, 0)

0人赞添加讨论(0) 举报

一夜七次

3楼-- · 2020-02-10 06:42

I would do this the following way.

For each column, create a new boolean series using the column's condition
Add those series row-wise

(Note that this is simpler if your Smoker and Diabetes column is already boolean (True/False) instead of in strings.)

It might look like this:

df = pd.DataFrame({'Age': [30,45,27,18,55],
                   'Smoker':['Y','N','N','Y','Y'],
                   'Diabetes': ['N','N','Y','Y','Y']})

   Age Diabetes Smoker
0   30        N      Y
1   45        N      N
2   27        Y      N
3   18        Y      Y
4   55        Y      Y

#Step 1
risk1 = df.Age > 45
risk2 = df.Smoker == "Y"
risk3 = df.Diabetes == "Y"
risk_df = pd.concat([risk1,risk2,risk3],axis=1)

     Age Smoker Diabetes
0  False   True    False
1  False  False    False
2  False  False     True
3  False   True     True
4   True   True     True

df['Risk_Factors'] = risk_df.sum(axis=1)

   Age Diabetes Smoker  Risk_Factors
0   30        N      Y             1
1   45        N      N             0
2   27        Y      N             1
3   18        Y      Y             2
4   55        Y      Y             3

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2020-02-10 06:44

If you want to stick with pandas. You can use the following...

Solution

isY = lambda x:int(x=='Y')
countRiskFactors = lambda row: isY(row['Smoker']) + isY(row['Diabetes']) + int(row["Age"]>45)

df['Risk_Factors'] = df.apply(countRiskFactors,axis=1)

How it works

isY - is a stored lambda function that checks if the value of a cell is Y returns 1 if it is otherwise 0 countRiskFactors - adds up the risk factors

the final line uses the apply method, with the paramater key set to 1, which applies the method -first parameter - row wise along the DataFrame and Returns a Series which is appended to the DataFrame.

output of print df

   Person_ID  Age Smoker Diabetes  Risk_Factors
0          1   30      Y        N             1
1          2   45      N        N             0
2          3   27      N        Y             1
3          4   18      Y        Y             2
4          5   55      Y        Y             3

0人赞添加讨论(0) 举报

COUNTIF in pandas python over multiple columns wit

Solution

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间