Why is ggplot graphing null percentage data points

2019-08-02 16:21发布

问题:

I've created a test data set to reproduce this problem:

Date    Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08 
2012-09 
2012-10 
2012-11 
2012-12 

These percentages were created by inputting decimal values in a csv file and converting the format of the Percent column into Percentage via Microsoft Excel.

When I try to graph this dataset with ggplot

data <- read.csv("GCdataViz/test2.csv")
p <- ggplot(data, aes(x=Date, y=Percent, group=1)) + 
  geom_point(size = 3) 
p

I get this graph

As you can see the null values are plotted, and the Y axis is also odd... The 3% datapoint is plotted above the 23%. It seems ggplot doesn't do too well with standardizing axes with percentages. is there a way I can set the correct range for the Y axis assuming I DO NOT KNOW the percentage values (assuming I am abstracted to the actual dataset other than it is a Percent column).

回答1:

The column Percent is a factor. By default, factor labels are orderer alphabetically. Hence, 3.00% comes after 12.00%. It will work if you transform the values of Percent to numeric values:

The data:

data <- read.table(text = "Date    Percent
2012-01 3.00%
2012-02 43.00%
2012-03 54.00%
2012-04 43.00%
2012-05 43.00%
2012-06 23.00%
2012-07 12.00%
2012-08 
2012-09 
2012-10 
2012-11 
2012-12 ", header = TRUE, fill = TRUE)

Create a new variable, Percent2, with numeric values:

data <- transform(data,
                  Percent2 = replace(as.numeric(gsub("%", "", Percent)),
                                     Percent == "", 0))

#       Date Percent Percent2
# 1  2012-01   3.00%        3
# 2  2012-02  43.00%       43
# 3  2012-03  54.00%       54
# 4  2012-04  43.00%       43
# 5  2012-05  43.00%       43
# 6  2012-06  23.00%       23
# 7  2012-07  12.00%       12
# 8  2012-08                0
# 9  2012-09                0
# 10 2012-10                0
# 11 2012-11                0
# 12 2012-12                0

Plot:

library(ggplot2)
ggplot(data, aes(x = Date, y = Percent2)) + 
  geom_point(size = 3) 



回答2:

Sven's answer gets OP most of the way home, but I believe OP does not want any points at all plotted for the values that were blank in the original Excel sheet. This can be accomplished one of two ways:

  • Use Sven's solution, followed by data$Percent2[data$Percent2==0] <- NA. (This will fail if you have real percent values that are equal to zero, as well as blank values.)

  • Better, in my opinion: When you save the original Excel sheet as a .csv file, make sure the Percentage column is formatted as Number (i.e., Format -> Cells and choose Number.) Make sure to include as many decimal places as are useful, since the exported text file will only have as many decimal places as you see on screen. For instance, a cell with value =1/3 will be exported as 0.3 if you only display one decimal place. Obviously you'll need to multiply by 100 in order to have R display percentage values rather than decimal fractions. R will import the blank spaces as NA, and you won't have to do any further processing.



标签: r ggplot2