I am trying to plot the descriptive variables in the first row by the following procedure. I also tried unsuccessfully with quoting the column/row names
- rotate rows and columns in the CSV data for the correposding data structure (tall table) required in the thread A very simple histogram with R? with
ggplot
to plot histogram of events as
Absolute
variable XOR (Average
,Min
,Max
)- If absolute value only, just draw absolute value in histogram.
- If (average, min and max), just draw them in the histogram with whiskers (= whisker plot) where the limits of the whiskers are made by the min and max.
Data
initially,
data.csv
"Vars" , "Sleep", "Awake", "REM", "Deep" "Absolute", , , 5 , 7 "Average" , 7 , 12 , , "Min" , 4 , 5 , , "Max" , 10 , 15 , ,
data after reshaping visually
V1 V2 V3 V4 Vars Absolute Average Min Max Sleep <NA> 7 4 10 Awake <NA> 12 5 15 REM 5 <NA> <NA> <NA> Deep 7 <NA> <NA> <NA>
data after reshaping for R
data <- structure(list(V1 = structure(c(3L, NA, NA, 1L, 2L), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c(" 5", " 7", "Absolute" ), class = "factor"), V2 = structure(c(3L, 2L, 1L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c("12", " 7", "Average " ), class = "factor"), V3 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c(" 4", " 5", "Min " ), class = "factor"), V4 = structure(c(3L, 1L, 2L, NA, NA), .Names = c("Vars", "Sleep", "Awake", "REM", "Deep"), .Label = c("10", "15", "Max " ), class = "factor")), .Names = c("V1", "V2", "V3", "V4"), row.names = c("Vars", "Sleep", "Awake", "REM", "Deep"), class = "data.frame")
R code with debugging code
dat.m <- read.csv("data.csv")
# rotate rows and columns
dat.m <- as.data.frame(t(dat.m)) # https://stackoverflow.com/a/7342329/54964 Comment 42-
library("reshape2")
dat.m <- melt(dat.m, id.vars="Vars")
## Just plot values existing there correspondingly
library("ggplot2")
# https://stackoverflow.com/a/25584792/54964
# TODO following
#ggplot(dat.m, aes(x = "Vars", y = value,fill=variable))
Error
Error: id variables not found in data: Vars
Execution halted
R: 3.3.3, 3.4.0 (backports)
OS: Debian 8.7
R reshape2, ggplot2, ... with sessionInfo()
after loading the two packages
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggplot2_2.1.0 reshape2_1.4.2
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 scales_0.4.1 magrittr_1.5 plyr_1.8.4
[5] tools_3.3.3 gtable_0.2.0 Rcpp_0.12.10 stringi_1.1.5
[9] grid_3.3.3 stringr_1.2.0 munsell_0.4.3
Testing HaberdashPI's proposal
Output in Fig. 1 where wrongly absolute value in Sleep
and Awake
.
If NA
, just set value to zero.
Fig. 1 HaberdashPI's proposal output not as expected
Data structure of dat.m
before the transpose
'data.frame': 4 obs. of 5 variables:
$ Absolute: Factor w/ 2 levels " 5"," 7": NA NA 1 2
..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"
$ Average : Factor w/ 2 levels "12"," 7": 2 1 NA NA
..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"
$ Min : Factor w/ 2 levels " 4"," 5": 1 2 NA NA
..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"
$ Max : Factor w/ 2 levels "10","15": 1 2 NA NA
..- attr(*, "names")= chr "Sleep" "Awake" "REM" "Deep"
$ Vars : chr "Sleep" "Awake" "REM" "Deep"
Absolute Average Min Max Vars
Sleep <NA> 7 4 10 Sleep
Awake <NA> 12 5 15 Awake
REM 5 <NA> <NA> <NA> REM
Deep 7 <NA> <NA> <NA> Deep
Data structure of dat.m
after the transpose
'data.frame': 16 obs. of 3 variables:
$ Vars : chr "Sleep" "Awake" "REM" "Deep" ...
$ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ...
$ value : chr NA NA " 5" " 7" ...
Vars variable value
1 Sleep Absolute <NA>
2 Awake Absolute <NA>
3 REM Absolute 5
4 Deep Absolute 7
5 Sleep Average 7
6 Awake Average 12
7 REM Average <NA>
8 Deep Average <NA>
9 Sleep Min 4
10 Awake Min 5
11 REM Min <NA>
12 Deep Min <NA>
13 Sleep Max 10
14 Awake Max 15
15 REM Max <NA>
16 Deep Max <NA>
Testing akash87's proposal
Code
ds <- dat.m
str(ds)
ds
ds$variable
ds$variable %in% c("Min","Max")
Wrong output because all False
in the end
$ Vars : chr "Sleep" "Awake" "REM" "Deep" ...
$ variable: Factor w/ 4 levels "Absolute","Average ",..: 1 1 1 1 2 2 2 2 3 3 ...
$ value : chr NA NA " 5" " 7" ...
Vars variable value
1 Sleep Absolute <NA>
2 Awake Absolute <NA>
3 REM Absolute 5
4 Deep Absolute 7
5 Sleep Average 7
6 Awake Average 12
7 REM Average <NA>
8 Deep Average <NA>
9 Sleep Min 4
10 Awake Min 5
11 REM Min <NA>
12 Deep Min <NA>
13 Sleep Max 10
14 Awake Max 15
15 REM Max <NA>
16 Deep Max <NA>
[1] "hello 3"
[1] Absolute Absolute Absolute Absolute Average Average Average Average
[9] Min Min Min Min Max Max Max Max
Levels: Absolute Average Min Max
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE
So doing ds[ds$variable %in% c("Min","Max"), ]
will given False
output because error-carried-forward.
Testing Uwe's proposal
Code with explicit data.table::dcast
and two times data.table::melt
. Printing out sessionInfo()
just before molten <- ...
. Note library(ggplot2)
is not loaded yet because the error comes from the line molten <- ...
.
$ Rscript test111.r
Vars "Average" "Max" "Min" Absolute
1: Sleep 7 10 4 NA
2: Awake 12 15 5 NA
3: REM NA NA NA 5
4: Deep NA NA NA 7
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.12.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets base
other attached packages:
[1] data.table_1.10.4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 methods_3.4.0
Error in melt.data.table(transposed, measure.vars = c("Absolute", "Average")) :
One or more values in 'measure.vars' is invalid.
Calls: <Anonymous> -> melt.data.table
Execution halted
Testing Uwe's proposal with test code 2
Code
molten <- structure(list(Vars = structure(c(1L, 2L, 1L, 2L, 1L, 2L), class = "factor", .Label = c("V1", "V2")), variable = structure(c(1L, 1L, 2L, 2L, 3L, 3L), class = "factor", .Label = c("ave", "ave_max", "lepo")), value = c(7L, 8L, 10L, 10L, 4L, 4L)), .Names = c("Vars", "variable", "value"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
print(molten)
library(ggplot2)
ggplot(molten, aes(x = Vars, y = value, fill = variable, ymin = lepo, ymax = ave_max)) +
geom_col() + geom_errorbar(width = 0.2)
Output
Vars variable value
1 V1 ave 7
2 V2 ave 8
3 V1 ave_max 10
4 V2 ave_max 10
5 V1 lepo 4
6 V2 lepo 4
Error in FUN(X[[i]], ...) : object 'lepo' not found
Calls: <Anonymous> ... by_layer -> f -> <Anonymous> -> f -> lapply -> FUN -> FUN
Execution halted
Your basic problem, is that your column and row names have been messed up when you call
dat.m <- as.data.frame(t(dat.m))
. That is not really the right way to rearrange your data.Your terminology is a little confusing (do your really mean histogram?), so I'm not sure if this is what you want, but I believe to solve the immediate problem you're having, you can do this:
Here's the output I get:
What I've done here is manually renamed the column names (
names(data.m) <- etc...
) and inserted a new column calledVars
, because you need those names as a column ofdat.m
, not a set of row names, to refer to them inmelt
(which is why you get the error you're getting about not being able to findVars
). It isn't elegant, but it gets the job done.It looks like you're making a lot more work for yourself than you may need. It appears that you have already collected a summary of your data in some other program (Excel?), which makes me think there is probably a simpler solution to your problem if you simply load your raw data into R and calculate the average, mean, min and so forth in R, or if you summarize your data in that external program in a format more canonical to R. Not knowing exactly what that raw data looks like, I can't give you a better answer.
Much of ggplot is organized around a set of principles for how data ought to be organized: I recommend reading through this blog post on dplyr and this one on tidyr.
This is a very concise version using
data.table
:Edit: I've amended above code to use the double colon operator to explicitely state the namespace from which
melt()
anddcast()
should be taken. Normally, this wouldn't be necessary asdata.table
is already loaded. However, the OP is reporting issues which might be caused by packagereshape2
being loaded afterdata.table
. Thedata.table
package has it's own faster implementations ofreshape2::dcast()
andreshape2::melt()
. When both packages have been loaded for some reason name clashes might occur.Data
I had to remove the
"
from the csv file. This was the fastest way to get rid of an error message fromfread()
complaining about the format of the header. So,data2.csv
looks like:Completion of akash87's code because the data conversion is not explicit in his answer from the original dta set
Output: same as in akash87's aswer.
The problem with your code is that you used "Vars" with a quote instead of simple Vars in the ggplot aes function. Also, the header of your data set is messed up. The Absolute, Average, ... should be the column names of the data set, not the values themselves. That's why you get the error from melt function.
Given your data set, here is my attempt:
This will produce:
I am adding another graph to the answer. This collaborates @Uwe answer.
This will produce:
I am approaching this a bit differently. If I take the original data as
And convert the data to
Then I can use
ggplot2
, then I can usewhich results in