Measures of association in R — Kendall's tau-b

2楼-- · 2019-03-08 04:55

There's a routine for Kendall's coefficient in psych package with corr.test(x, method = "kendall"). This function can be applied on data.frame, and also displays p-values for each pair of variables. I guess it displays tau-a coefficient. Only downside is that it's actually a wrapper for cor() function.

Wikipedia has good reference on Kendall's coefficient, and check this link out. Try sos package and findFn() function. I got bunch of stuff when querying "tau a" and tau b, but both ended with no luck. And search results seem to merge to Kendall package, as @Ian suggested.

0人赞添加讨论(0) 举报

We Are One

3楼-- · 2019-03-08 04:58

Stumbled across this page today, as I was looking for an implementation of kendall tau-b in R
For anyone else looking for the same thing:
tau-b is in fact part of the stats package.

See this link for more details: https://stat.ethz.ch/pipermail/r-help//2012-August/333656.html

I tried it and it works: library(stats)

x <- c(1,1,2)
y<-c(1,2,3)
cor.test(x, y, method = "kendall", alternative = "greater")

this is the output:

data:  x and y
z = 1.2247, p-value = 0.1103
alternative hypothesis: true tau is greater than 0
sample estimates:
      tau 
0.8164966 

Warning message:
In cor.test.default(x, y, method = "kendall", alternative = "greater") :
  Cannot compute exact p-value with ties

Just ignore the warning messege. The tau is in fact tau b !!!

0人赞添加讨论(0) 举报

冷血范

4楼-- · 2019-03-08 05:02

There are three Kendall tau statistics (tau-a, tau-b, and tau-c).

They are not interchangeable, and none of the answers posted so far deal with the last two, which is the subject of the OP's question.

I was unable to find functions to calculate tau-b or tau-c, either in the R Standard Library (stat et al.) or in any of the Packages available on CRAN or other repositories. I used the excellent R Package sos to search, so i believe results returned were reasonably thorough.

So that's the short answer to the OP's Question: no built-in or Package function for tau-b or tau-c.

But it's easy to roll your own.

Writing R functions for the Kendall statistics is just a matter of translating these equations into code:

Kendall_tau_a = (P - Q) / (n * (n - 1) / 2)

Kendall_tau_b = (P - Q) / ( (P + Q + Y0) * (P + Q + X0) ) ^ 0.5 

Kendall_tau_c = (P - Q) * ((2 * m) / n ^ 2 * (m - 1) )

tau-a: equal to concordant minus discordant pairs, divided by a factor to account for total number of pairs (sample size).

tau-b: explicit accounting for ties--i.e., both members of the data pair have the same value; this value is equal to concordant minus discordant pairs divided by a term representing the geometric mean between the number of pairs not tied on x (X0) and the number not tied on y (Y0).

tau-c: larger-table variant also optimized for non-square tables; equal to concordant minus discordant pairs multiplied by a factor that adjusts for table size).

# Number of concordant pairs.
P = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
    r = r_ndx, c = c_ndx))
}

# Number of discordant pairs.
Q = function(t) {
  r_ndx = row(t)
  c_ndx = col(t)
  sum(t * mapply( function(r, c){
      sum(t[(r_ndx > r) & (c_ndx < c)])
  },
    r = r_ndx, c = c_ndx) )
}

# Sample size (total number of pairs).
n = n = sum(t)

# The lesser of number of rows or columns.
m = min(dim(t))

So these four parameters are all you need to calculate tau-a, tau-b, and tau-c:

P
Q
m
n

(plus XO & Y0 for tau-b)

For instance, the code for tau-c is:

kendall_tau_c = function(t){
    t = as.matrix(t) 
    m = min(dim(t))
    n = sum(t)
    ks_tauc = (m * 2 * (P(t) - Q(t))) / ((n ^ 2) * (m - 1))
}

So how are Kendall's tau statistics related to the other statistical tests used in categorical data analysis?

All three Kendall tau statistics, along with Goodman's and Kruskal's gamma are for correlation of ordinal and binary data. (The Kendall tau statistics are more sophisticated alternatives to the gamma statistic (just P-Q).)

And so Kendalls's tau and the gamma are counterparts to the simple chi-square and Fisher's exact tests, both of which are (as far as I know) suitable only for nominal data.

example:

cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1)
revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2)
weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1)

dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight)

# Reshape data frame so 1 row for each event 
# (predicate step to create contingency table).
dfx2 = data.frame(lapply(dfx, function(x) { rep(x, dfx$freq)}))

t = xtabs(~ revenue + cpa, dfx)

kc = kendall_tau_c(t)

# Returns -.35.

0人赞添加讨论(0) 举报

Juvenile、少年°

5楼-- · 2019-03-08 05:02

Quite a while, but the 3 functions are implemented in DescTools.

library(DescTools)
# example in: 
# http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf
# pp. S. 1821
tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23)))

# tau-a
KendallTauA(tab, conf.level=0.95)
tau_a    lwr.ci    ups.ci 
0.2068323 0.1771300 0.2365346 

# tau-b
KendallTauB(tab, conf.level=0.95)
    tau_b    lwr.ci    ups.ci 
0.3372567 0.2114009 0.4631126 

# tau-c
> StuartTauC(tab, conf.level=0.95)
     tauc    lwr.ci    ups.ci 
0.4110953 0.2546754 0.5675151 

# alternative for tau-b:
d.frm <- Untable(tab, dimnames = list(1:2, 1:5))
cor(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2),method="kendall")
[1] 0.3372567

# but no confidence intervalls for tau-b! Check:
unclass(cor.test(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2), method="kendall"))

0人赞添加讨论(0) 举报

唯我独甜

6楼-- · 2019-03-08 05:03

I have been doing a bit research on Kendall's tau. Directly using cor(x, y, method="kendall") will give you Kendall's tau-b, which is a little different from the original definition, i.e., Kendall's tau-a. Kendall's tau-b is more commonly used as it takes into account ties, hence, most available software packages (e.g. cor(), Kendall()) all calculate Kendall's tau-b.

The difference between Kendall's tau-a and tau-b is essentially the denominator. Specifically, for Kendall's tau-a, the denominator D=n*(n-1)/2, which is fixed, while for Kendall's tau-b, the denominator D=sqrt(No. pairs of Var1 excluding tied pairs)*sqrt(No. pairs of Var2 excluding tied pairs). The value of tua-b is usually larger than tau-a.

As a simple example, consider X=(1,2,3,4,4), Y=(2,3,4,4,4). Kendall's tau-b=0.88, while tau-a=0.7.

For Kendall's tau-c, I didn't see too much on it, so no comments.

0人赞添加讨论(0) 举报

Root（大扎）

7楼-- · 2019-03-08 05:04

Have you tried the function cor? There is a method you can set to "kendall" (also options for "pearson" and"spearman" if needed), not sure if that covers all the standard errors you are looking for but it should get you started.

0人赞添加讨论(0) 举报

Measures of association in R — Kendall's tau-b

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间