I've created a test data set to reproduce this problem:
Date Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12
These percentages were created by inputting decimal values in a csv file and converting the format of the Percent
column into Percentage via Microsoft Excel.
When I try to graph this dataset with ggplot
data <- read.csv("GCdataViz/test2.csv")
p <- ggplot(data, aes(x=Date, y=Percent, group=1)) +
geom_point(size = 3)
p
I get this graph
As you can see the null values are plotted, and the Y axis is also odd... The 3% datapoint is plotted above the 23%. It seems ggplot
doesn't do too well with standardizing axes with percentages. is there a way I can set the correct range for the Y axis assuming I DO NOT KNOW the percentage values (assuming I am abstracted to the actual dataset other than it is a Percent column).
The column Percent
is a factor. By default, factor labels are orderer alphabetically. Hence, 3.00%
comes after 12.00%
. It will work if you transform the values of Percent
to numeric values:
The data:
data <- read.table(text = "Date Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08
2012-09
2012-10
2012-11
2012-12 ", header = TRUE, fill = TRUE)
Create a new variable, Percent2
, with numeric values:
data <- transform(data,
Percent2 = replace(as.numeric(gsub("%", "", Percent)),
Percent == "", 0))
# Date Percent Percent2
# 1 2012-01 3.00% 3
# 2 2012-02 43.00% 43
# 3 2012-03 54.00% 54
# 4 2012-04 43.00% 43
# 5 2012-05 43.00% 43
# 6 2012-06 23.00% 23
# 7 2012-07 12.00% 12
# 8 2012-08 0
# 9 2012-09 0
# 10 2012-10 0
# 11 2012-11 0
# 12 2012-12 0
Plot:
library(ggplot2)
ggplot(data, aes(x = Date, y = Percent2)) +
geom_point(size = 3)
Sven's answer gets OP most of the way home, but I believe OP does not want any points at all plotted for the values that were blank in the original Excel sheet. This can be accomplished one of two ways:
Use Sven's solution, followed by data$Percent2[data$Percent2==0] <- NA
. (This will fail if you have real percent values that are equal to zero, as well as blank values.)
Better, in my opinion: When you save the original Excel sheet as a .csv file, make sure the Percentage column is formatted as Number
(i.e., Format -> Cells
and choose Number
.) Make sure to include as many decimal places as are useful, since the exported text file will only have as many decimal places as you see on screen. For instance, a cell with value =1/3
will be exported as 0.3
if you only display one decimal place. Obviously you'll need to multiply by 100 in order to have R display percentage values rather than decimal fractions. R will import the blank spaces as NA
, and you won't have to do any further processing.