Time series and stl in R: Error only univariate se

2019-02-13 13:03发布

问题:

I am doing analysis on hourly precipitation on a file that is disorganized. However, I managed to clean it up and store it in a dataframe (called CA1) which takes the form as followed:

  Station_ID Guage_Type   Lat   Long       Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5        H6        H7        H8        H9       H10       H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
1    4457700         HI 41.52 124.03 1948-07-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0  0  0   0   0   0   0   0   0
2    4457700         HI 41.52 124.03 1948-07-05         8        LST  0  1  1  1  1  1  2.0000000 2.0000000 2.0000000 4.0000000 5.0000000 5.0000000   4   7   1   1   0 0  10  13   5   1   1   3
3    4457700         HI 41.52 124.03 1948-07-06         8        LST  1  1  1  0  1  1 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0  0   0   0   0   0   0   0
4    4457700         HI 41.52 124.03 1948-07-27         8        LST  3  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
5    4457700         HI 41.52 124.03 1948-08-01         8        LST  0  0  0  0  0  0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000   0   0   0   0   0 0   0   0   0   0   0   0
6    4457700         HI 41.52 124.03 1948-08-17         8        LST  0  0  0  0  0  0 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889   6   1   0   0   0 0   0   0   0   0   0   0

Where H0 through H23 represent the 24 hours per day (row)

Using only CA1 (the dataframe above), I take each day (row) of 24 points and transpose it vertically and concatenate the remaining days (rows) to one variable, which I call dat1:

 > dat1[1:48,]
  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23  H0  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 
   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   1   1   2   2   2   4   5   5   4   7   1   1   0  0  10  13   5   1   1   3 

Using the variable dat1, I input it as an argument to get a time series data:

> rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    frequency = 24)

A few things to note:

>dim(CA1)
  [1] 5636   31
>length(dat1)
  [1] 135264

Thus 5636*24 (total data points [24] per row) = 135264 total points. The length(rainCA1) agrees with the points above. However, if I put an end in the ts function, such as

>rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon), 
    end = c(1900+as.POSIXlt(CA1[5636,5])$year, 1+as.POSIXlt(CA1[5636,5])$mon),
    frequency = 24)

I get 1134 total length of points, where I am missing a lot of data. I am assuming this is due to the dates not being consecutive and since I am only apply the month and year as argument for the starting point.

Continuing, in what I think is the correct path, using the first ts calculation without the end argument, I supply it as an input for stl:

>rainCA1_2 <-stl(rainCA1, "periodic")

Unfortunately, I get an error:

Error in stl(rainCA1, "periodic") : only univariate series are allowed

Which I don't understand or how to go about it. However, if I return to the ts function and provide the end argument, stl works fine without any errors.

I have researched in a lot of forums, but no one (or to my understanding) provides a well solution to obtain the data attributes of hourly data. If anyone could help me, I will highly appreciate it. Thank you!

回答1:

That error is a result of the shape of your data. Try > dim(rainCA1); I suspect it to give something like > [1] 135264 1. Replace rainCA1 <- ts(dat1 ... by rainCA1 <- ts(dat1[[1]] ..., and it should work.

Whether it does so correctly, I wonder... It seems to me your first order of business is to get your data of a consistent format. Make sure ts() gets the right input. Check out the precise specification of ts.

ts() does not interpret date-time formats. ts() requires consecutive data points with a fixed interval. It uses a major counter and a minor counter (of which frequency fit into one major counter). For instance, if your data is hourly and you expect seasonality on the daily level, frequency equals 24. start and end, therefore, are primarily cosmetic: start merely indicates t(0) for the major counter, whereas end signifies t(end).



回答2:

I tried to explain the write way with a very easy example to avoid these kind of errors in another question, linked here:

stl() decomposition won't accept univariate ts object?



回答3:

One solution I found is time_series_var <- ts(data[, c("var_of_interest")]) and then time_series_var <- ts(as.vector(time_series_var)) and then the error related to univariate disappears as the dimensions are now correct.