How do I scale the y-axis on a histogram by the x

2019-07-21 08:08发布

问题:

I have some data which represents a sizes of particles. I want to plot the frequency of each binned-size of particles as a histogram, but scale the frequency but the size of the particle (so it represents total mass at that size.)

I can plot a histogram fine, but I am unsure how to scale the Y-axis by the X-value of each bin.

e.g. if I have 10 particles in the 40-60 bin, I want the Y-axis value to be 10*50=500.

回答1:

You would better use barplot in order to represent the total mass by the area of the bins (i.e. height gives the count, width gives the mass):

sizes <- 3:10   #your sizes    
part.type <- sample(sizes, 1000, replace = T)  #your particle sizes  

count <- table(part.type)  
barplot(count, width = size)  

If your particle sizes are all different, you should first cut the range into appropriate number of intervals in order to create part.type factor:

part <- rchisq(1000, 10)  
part.type <- cut(part, 4)  

count <- table(part.type)  
barplot(count, width = size)  

If the quantity of interest is only total mass. Then, the appropriate plot is the dotchart. It is also much clearer comparing to the bar plot for a large number of sizes:

part <- rchisq(1000, 10)
part.type <- cut(part, 20)

count <- table(part.type)
dotchart(count)

Representing the total mass with bins would be misleading because the area of the bins is meaningless.



回答2:

if you really want to use the mid point of each bin as a scaling factor:

d<-rgamma(100,5,1.5) # sample
z<-hist(d,plot=FALSE) # make histogram, i.e., divide into bins and count up
co<-z$counts # original counts of each bin
z$counts<-z$counts*z$mids # scaled by mids of the bin

plot(z, xlim=c(0,10),ylim=c(0,max(z$counts))) # plot scaled histogram
par(new=T)
plot(z$mids,co,col=2,  xlim=c(0,10),ylim=c(0,max(z$counts))) # overplot original counts

instead, if you want to use the actual value of each sample point as a scaling factor:

d<-rgamma(100,5,1.5)
z<-hist(d,plot=FALSE)
co<-z$counts # original counts of each bin
z$counts<-aggregate(d,list(cut(d,z$breaks)),sum)$x # sum up the value of data in each bin

plot(z, xlim=c(0,10),ylim=c(0,max(z$counts))) # plot scaled histogram
par(new=T)
plot(z$mids,co,col=2,  xlim=c(0,10),ylim=c(0,max(z$counts))) # overplot original counts


回答3:

Just hide the axes and replot them as needed.

# Generate some dummy data
datapoints <- runif(10000, 0, 100)

par (mfrow = c(2,2))

# We will plot 4 histograms, with different bin size
binsize <- c(1, 5, 10, 20)

for (bs in binsize)
    {
    # Plot the histogram. Hide the axes by setting axes=FALSE
    h <- hist(datapoints, seq(0, 100, bs), col="black", axes=FALSE, 
        xlab="", ylab="", main=paste("Bin size: ", bs))
    # Plot the x axis without modifying it
    axis(1)
    # This will NOT plot the axis (lty=0, labels=FALSE), but it will return the tick values
    yax <- axis(2, lty=0, labels=FALSE)
    # Plot the axis by appropriately scaling the tick values
    axis(2, at=yax, labels=yax/bs)
    }