我需要在痛苦的详细解释这一点,因为我没有统计的基础知识在一个更简洁的方式来解释。 询问在这里,是因为我要寻找一个解决方案蟒蛇,但如果更合适可能会去stats.SE。
我有井下数据,它可能是一个有点像这样:
Rt T
0.0000 15.0000
4.0054 15.4523
25.1858 16.0761
27.9998 16.2013
35.7259 16.5914
39.0769 16.8777
45.1805 17.3545
45.6717 17.3877
48.3419 17.5307
51.5661 17.7079
64.1578 18.4177
66.8280 18.5750
111.1613 19.8261
114.2518 19.9731
121.8681 20.4074
146.0591 21.2622
148.8134 21.4117
164.6219 22.1776
176.5220 23.4835
177.9578 23.6738
180.8773 23.9973
187.1846 24.4976
210.5131 25.7585
211.4830 26.0231
230.2598 28.5495
262.3549 30.8602
266.2318 31.3067
303.3181 37.3183
329.4067 39.2858
335.0262 39.4731
337.8323 39.6756
343.1142 39.9271
352.2322 40.6634
367.8386 42.3641
380.0900 43.9158
388.5412 44.1891
390.4162 44.3563
395.6409 44.5837
(对RT变量可以被认为是深度的代理,而T是温度)。 我也有“先验”数据给我的温度在Rt = 0和未示出的一些标志物,我可以作为断点使用,引导到断点,或至少比较任何发现的断点。
这两个变量的线性关系是在受一些工艺中一定深度的间隔。 一个简单的线性回归是
q, T0, r_value, p_value, std_err = stats.linregress(Rt, T)
看起来像这样,在这里你可以清楚地看到偏差,可怜适合T0(应为15):
我希望能够执行一系列线性回归(在各段的端部接合)的,但我想这样做:(a)以不指定场所的数量或位置,(b)中,通过指定的数量和位置断裂,以及(c)计算所述系数为每个段
我想我可以(b)和(c)以只拆分数据并带着几分照顾单独做每一位做,但我不知道(一),并想知道如果有一个人的方式知道这能更简单地完成。
我已经看到了这一点: https://stats.stackexchange.com/a/20210/9311 ,我认为MARS可能是对付它的好办法,但是这只是因为它看起来不错; 我真的不明白。 我与我的数据试图在盲cut'n'paste方式有下面的输出,但同样,我不明白:
简短的回答是我,使用R创建一个线性回归模型解决了问题,然后使用的segmented
包,以产生来自线性模型的分段线性回归。 我能够指定断点(或节)的预期数量n
如下所示使用psi=NA
和K=n
。
长的答案是:
ř版本3.0.1(2013年5月16日)
平台:x86_64的-PC-Linux的GNU(64位)
# example data:
bullard <- structure(list(Rt = c(5.1861, 10.5266, 11.6688, 19.2345, 59.2882,
68.6889, 320.6442, 340.4545, 479.3034, 482.6092, 484.048, 485.7009,
486.4204, 488.1337, 489.5725, 491.2254, 492.3676, 493.2297, 494.3719,
495.2339, 496.3762, 499.6819, 500.253, 501.1151, 504.5417, 505.4038,
507.6278, 508.4899, 509.6321, 522.1321, 524.4165, 527.0027, 529.2871,
531.8733, 533.0155, 544.6534, 547.9592, 551.4075, 553.0604, 556.9397,
558.5926, 561.1788, 562.321, 563.1831, 563.7542, 565.0473, 566.1895,
572.801, 573.9432, 575.6674, 576.2385, 577.1006, 586.2382, 587.5313,
589.2446, 590.1067, 593.4125, 594.5547, 595.8478, 596.99, 598.7141,
599.8563, 600.2873, 603.1429, 604.0049, 604.576, 605.8691, 607.0113,
610.0286, 614.0263, 617.3321, 624.7564, 626.4805, 628.1334, 630.9889,
631.851, 636.4198, 638.0727, 638.5038, 639.646, 644.8184, 647.1028,
647.9649, 649.1071, 649.5381, 650.6803, 651.5424, 652.6846, 654.3375,
656.0508, 658.2059, 659.9193, 661.2124, 662.3546, 664.0787, 664.6498,
665.9429, 682.4782, 731.3561, 734.6619, 778.1154, 787.2919, 803.9261,
814.335, 848.1552, 898.2568, 912.6188, 924.6932, 940.9083), Tem = c(12.7813,
12.9341, 12.9163, 14.6367, 15.6235, 15.9454, 27.7281, 28.4951,
34.7237, 34.8028, 34.8841, 34.9175, 34.9618, 35.087, 35.1581,
35.204, 35.2824, 35.3751, 35.4615, 35.5567, 35.6494, 35.7464,
35.8007, 35.8951, 36.2097, 36.3225, 36.4435, 36.5458, 36.6758,
38.5766, 38.8014, 39.1435, 39.3543, 39.6769, 39.786, 41.0773,
41.155, 41.4648, 41.5047, 41.8333, 41.8819, 42.111, 42.1904,
42.2751, 42.3316, 42.4573, 42.5571, 42.7591, 42.8758, 43.0994,
43.1605, 43.2751, 44.3113, 44.502, 44.704, 44.8372, 44.9648,
45.104, 45.3173, 45.4562, 45.7358, 45.8809, 45.9543, 46.3093,
46.4571, 46.5263, 46.7352, 46.8716, 47.3605, 47.8788, 48.0124,
48.9564, 49.2635, 49.3216, 49.6884, 49.8318, 50.3981, 50.4609,
50.5309, 50.6636, 51.4257, 51.6715, 51.7854, 51.9082, 51.9701,
52.0924, 52.2088, 52.3334, 52.3839, 52.5518, 52.844, 53.0192,
53.1816, 53.2734, 53.5312, 53.5609, 53.6907, 55.2449, 57.8091,
57.8523, 59.6843, 60.0675, 60.8166, 61.3004, 63.2003, 66.456,
67.4, 68.2014, 69.3065)), .Names = c("Rt", "Tem"), class = "data.frame", row.names = c(NA,
-109L))
library(segmented) # Version: segmented_0.2-9.4
# create a linear model
out.lm <- lm(Tem ~ Rt, data = bullard)
# Set X breakpoints: Set psi=NA and K=n:
o <- segmented(out.lm, seg.Z=~Rt, psi=NA, control=seg.control(display=FALSE, K=3))
slope(o) # defaults to confidence level of 0.95 (conf.level=0.95)
# Trickery for placing text labels
r <- o$rangeZ[, 1]
est.psi <- o$psi[, 2]
v <- sort(c(r, est.psi))
xCoord <- rowMeans(cbind(v[-length(v)], v[-1]))
Z <- o$model[, o$nameUV$Z]
id <- sapply(xCoord, function(x) which.min(abs(x - Z)))
yCoord <- broken.line(o)[id]
# create the segmented plot, add linear regression for comparison, and text labels
plot(o, lwd=2, col=2:6, main="Segmented regression", res=TRUE)
abline(out.lm, col="red", lwd=1, lty=2) # dashed line for linear regression
text(xCoord, yCoord,
labels=formatC(slope(o)[[1]][, 1] * 1000, digits=1, format="f"),
pos = 4, cex = 1.3)