I'm looking at the example menu of the command cut()
(example(cut)
), specifically this part:
cut> aaa <- c(1,2,3,4,5,2,3,4,5,6,7)
cut> cut(aaa, 3)
[1] (0.994,3] (0.994,3] (3,5] (3,5] (3,5] (0.994,3]
[7] (3,5] (3,5] (3,5] (5,7.01] (5,7.01]
Levels: (0.994,3] (3,5] (5,7.01]
cut> cut(aaa, 3, dig.lab = 4, ordered = TRUE)
[1] (0.994,2.998] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[5] (2.998,5.002] (0.994,2.998] (2.998,5.002] (2.998,5.002]
[9] (2.998,5.002] (5.002,7.006] (5.002,7.006]
Levels: (0.994,2.998] < (2.998,5.002] < (5.002,7.006]
cut> ## one way to extract the breakpoints
cut> labs <- levels(cut(aaa, 3))
cut> cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
cut+ upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
lower upper
[1,] 0.994 3.00
[2,] 3.000 5.00
[3,] 5.000 7.01
Where the intervals are closed on the right (as shown above), then it shows me a way to extract the breakpoints of the data using cbind()
Now, let's suppose my data will by cut, but indicating that the intervals are closed on the left.
cut(aaa, 3, dig.lab = 4, ordered = TRUE, right = FALSE)
How can I extract now my breakpoints using the same command cbind()
? (If there are more ways, you're welcome)
Just use something like the following for your pattern, and use
gsub
instead:"\\[|\\]|\\(|\\)"
.An example.
And, here's a quick way to read that data in:
FYI: The same pattern would work whether the intervals are closed on the left or on the right. Using your original example:
As for alternatives, since you just need to strip out the first and last character before you can use
read.csv
, you can also easily usesubstr
without having to fuss with regular expressions (if that's not your thing):Update: A totally different alternative
Since it is obvious that R has to calculate these values and store them as part of the function in order to generate the output you see, it is not too difficult to manipulate the function to get it to output different things.
Looking at the code for
cut.default
, you'll find the following as the last few lines:It's really easy to change the last few lines to output a
list
that contains the output ofcut
as the first item, and the calculated ranges (from thecut
function directly, not extracting from the pasted togetherfactor
labels
.For instance, in the Gist I've posted at this link, I've changed those lines as follows:
Now, compare:
And,
right = FALSE
: