Title: | Convolution-Closed Models for Count Time Series |
---|---|
Description: | Useful tools for fitting, validating, and forecasting of practical convolution-closed time series models for low counts are provided. Marginal distributions of the data can be modelled via Poisson and Generalized Poisson innovations. Regression effects can be modelled via time varying innovation rates. The models are described in Jung and Tremayne (2011) <doi:10.1111/j.1467-9892.2010.00697.x> and the model assessment tools are presented in Czado et al. (2009) <doi:10.1111/j.1541-0420.2009.01191.x>, Gneiting and Raftery (2007) <doi:10.1198/016214506000001437> and, Tsay (1992) <doi:10.2307/2347612>. |
Authors: | Manuel Huth [aut, cre], Robert C. Jung [aut], Andy Tremayne [aut] |
Maintainer: | Manuel Huth <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.3 |
Built: | 2024-11-20 10:50:32 UTC |
Source: | https://github.com/manuhuth/coconots |
Functions to analyse time series consisting of low counts are provided. The focus in the current version is on practical models that can capture first and higher-order dependence based on the work of Joe (1996). Both equidispersed and overdispersed marginal distributions of data can be modelled. Regression effects can be included. Fast and efficient procedures for likelihood based inference and probabilistic forecasting are provided as well as useful tools for model validation and diagnostics.
The package allows simulation of convolution-closed count time series models with the cocoSim
function. Model fitting is performed with the cocoReg
routine. By passing a cocoReg-type object,
the S3 method predict
computes the one-step ahead forecasting distribution. cocoBoot
, cocoPit
,
cocoScore
, and cocoResid
provide routines for model assessment.
The main usage of the package is illustrated within the cocoReg function chapter. For more details
and examples of the functions see the respective sections within this vignette.
By default, our functions make use of an RCPP implementation. However, users
with a running Julia installation can choose to call Julia in the background
to run their functions by specifiying it in the R function input. This option
is particularly useful for the regression (cocoReg
), where a complex likelihood function
must be numerically evaluated to obtain parameter estimates. By leveraging
Julia's automatic differentiation capabilities,
our functions can take advantage of numerical gradients,
leading to increased numerical stability and faster convergence.
As we find both, the Julia and RCPP implementations produce qualitatively similar results in all our tests, we have decided to use the RCPP implementation as the default option to make our package accessible to non-Julia users.
Maintainer: Manuel Huth <[email protected]>
Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics 65, 1254–61.
Gneiting, T. and Raftery, A. E. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359-378.
R.C. Jung, A.R. Tremayne (2006) Coherent forecasting in integer time series models. International Journal of Forecasting 22, 223–238
Jung, R. C. and Tremayne, A. R. (2011) Convolution-closed models for count time series with applications. Journal of Time Series Analysis, 32, 3, 268–280.
Jung, Robert C., Brendan P. M. McCabe, and Andrew R. Tremayne. (2016). Model validation and diagnostics. In Handbook of Discrete Valued Time Series. Edited by Richard A. Davis, Scott H. Holan, Robert Lund and Nalini Ravishanker. Boca Raton: Chapman and Hall, pp. 189–218.
Joe, H. (1996) Time series models with univariate margins in the convolution-closed infinitely divisible class. Journal of Applied Probability, 664–677.
Tsay, R. S. (1992) Model checking via parametric bootstraps in time series analysis. Applied Statistics 41, 1–15.
Westgren, A. (1916) Die Veraenderungsgeschwindigkeit der lokalen Teilchenkonzentration in kollioden Systemen (Erste Mitteilung). Arkiv foer Matematik, Astronomi och Fysik, 11, 1–24.
Model checking procedure emphasizing reproducibility in fitted models to provide an overall evaluation of fit as proposed by Tsay (1992).
cocoBoot( coco, numb.lags = 21, rep.Bootstrap = 1000, conf.alpha = 0.05, julia = FALSE, julia_seed = NULL )
cocoBoot( coco, numb.lags = 21, rep.Bootstrap = 1000, conf.alpha = 0.05, julia = FALSE, julia_seed = NULL )
coco |
An object of class coco |
numb.lags |
Number of lags for which to compute autocorrelations |
rep.Bootstrap |
Number of bootstrap replicates to use |
conf.alpha |
Confidence level for the quantile intervals |
julia |
if TRUE, the bootstrap is run with Julia. |
julia_seed |
Seed for the julia implementation. Only used if julia equals TRUE. |
Computes bootstrap confidence intervals for the autocorrelations of a fitted model.
an object of class cocoBoot. It contains the bootstraped confidence intervals of the autocorrelations and information on the model specifications.
Tsay, R. S. (1992) Model checking via parametric bootstraps in time series analysis. Applied Statistics 41, 1–15.
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) fit <- cocoReg(order = 1, type = "Poisson", data = data) #assessment using bootstrap - R implementation boot_r <- cocoBoot(fit, rep.Bootstrap=400)
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) fit <- cocoReg(order = 1, type = "Poisson", data = data) #assessment using bootstrap - R implementation boot_r <- cocoBoot(fit, rep.Bootstrap=400)
Computes the probability integral transform (PIT) and provides the non-randomized PIT histogram for assessing absolute performance of a fitted model as proposed by Czado et al. (2009).
cocoPit(coco, J = 10, conf.alpha = 0.05, julia = FALSE)
cocoPit(coco, J = 10, conf.alpha = 0.05, julia = FALSE)
coco |
An object of class coco |
J |
Number of bins for the histogram (default: 10) |
conf.alpha |
Confidence level for the confidence bands. |
julia |
if TRUE, the PIT is computed with Julia. |
The adequacy of a distributional assumption for a model is checked by
checking the cumulative non-randomized PIT distribution for uniformity.
A useful graphical device is the PIT histogram, which displays this
distribution to J equally spaced bins. We supplement the graph by
incorporating approximately confidence intervals obtained
from a standard chi-square goodness-of-fit test of the null hypothesis that
the J bins of the histogram are drawn from a uniform distribution.
For details, see Jung, McCabe and Tremayne (2016).
an object of class cocoPit. It contains the The probability integral transform values, its p-values and information on the model specifications.
Manuel Huth
Czado, C., Gneiting, T. and Held, L. (2009) Predictive model assessment for count data. Biometrics 65, 1254–61.
Jung, Robert C., Brendan P. M. McCabe, and Andrew R. Tremayne. (2016). Model validation and diagnostics. In Handbook of Discrete Valued Time Series. Edited by Richard A. Davis, Scott H. Holan, Robert Lund and Nalini Ravishanker. Boca Raton: Chapman and Hall, pp. 189–218.
Jung, R. C. and Tremayne, A. R. (2011) Convolution-closed models for count time series with applications. Journal of Time Series Analysis, 32, 3, 268–280.
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) #julia_installed = TRUE ensures that the fit object #is compatible with the julia cocoPit implementation fit <- cocoReg(order = 1, type = "Poisson", data = data) #PIT R implementation pit_r <- cocoPit(fit)
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) #julia_installed = TRUE ensures that the fit object #is compatible with the julia cocoPit implementation fit <- cocoReg(order = 1, type = "Poisson", data = data) #PIT R implementation pit_r <- cocoPit(fit)
The function fits first and second order (Generalized) Poisson Autoregressive (G)PAR models presented in (Jung and Tremayne, 2010). Autoregressive dependence on past counts is modelled by a special random operator that not only preserve integer status, but also, via the property of closure under convolution, ensure that the marginal distribution of the observed counts is from the same family as the innovations. The models can be thought of as stationary Markov chains of finite order, where the distribution of the innovations can either be Poisson or Generalized Poisson, where the latter can account for overdispersed data. Maximum likelihood is used for estimation and the user can choose to include linear constraints or not. If linear constraints are not included, it cannot be guaranteed that the parameters will lie in the theoretically feasible parameter space, but the optimization process might be faster. The function uses method of moments estimators to obtain starting values for the numerical optimization, but the user can also specify their own starting values if desired.
If Julia is installed, the user can choose whether the optimization is run in Julia which might faster yield results and increased numeric stability due to the use of automatic differentiation. See details for more information on the Julia implementation.
cocoReg( type, order, data, xreg = NULL, constrained.optim = TRUE, b.beta = -10, start = NULL, start.val.adjust = TRUE, method_optim = "Nelder-Mead", replace.start.val = 1e-05, iteration.start.val = 0.6, method.hessian = "Richardson", cores = 2, julia = FALSE, julia_installed = FALSE, link_function = "log" )
cocoReg( type, order, data, xreg = NULL, constrained.optim = TRUE, b.beta = -10, start = NULL, start.val.adjust = TRUE, method_optim = "Nelder-Mead", replace.start.val = 1e-05, iteration.start.val = 0.6, method.hessian = "Richardson", cores = 2, julia = FALSE, julia_installed = FALSE, link_function = "log" )
type |
character string indicating the type of model to be fitted |
order |
integer vector indicating the order of the model |
data |
time series data to be used in the analysis |
xreg |
optional matrix of explanatory variables for use in a regression model |
constrained.optim |
logical indicating whether optimization should be constrained, currently only available in the R version |
b.beta |
numeric value indicating the lower bound for the parameters of the explanatory variables for the optimization, currently only available in the R version |
start |
optional numeric vector of starting values for the optimization |
start.val.adjust |
logical indicating whether starting values should be adjusted, currently only available in the R version |
method_optim |
character string indicating the optimization method to be used, currently only available in the R version. In the julia implementation this is by default the LBFGS algorithm |
replace.start.val |
numeric value indicating the value to replace any invalid starting values, currently only available in the R version |
iteration.start.val |
numeric value indicating the proportion of the interval to use as the new starting value, currently only available in the R version |
method.hessian |
character string indicating the method to be used to approximate the Hessian matrix, currently only available in the R version |
cores |
numeric indicating the number of cores to use, currently only available in the R version |
julia |
if TRUE, the model is estimated with Julia. This can improve the speed significantly since Julia makes use of derivatives using autodiff. In this case, only type, order, data, xreg, and start are used as other inputs. |
julia_installed |
if TRUE, the model R output will contain a Julia compatible output element. |
link_function |
Specifies the link function for the conditional mean of the innovation ( |
Let a time series of counts be and be
a random operator that differs between model specifications.
For more details on the random operator, see Jung and Tremayne (2011) and Joe (1996).
The general first-order model is of the form
and the general second-order model of the form
where are i.i.d Poisson (
) or Generalized
Poisson (
) innovations. Through closure under convolution
the marginal distributions of
are therefore Poisson or Generalized Poisson distributions, respectively.
If no covariates are used and if covariates are used
whereby is the
-th covariate at time
and
is a link function.
Current supported link functions are the identity
and a logarithmic link function
. To ensure positivity of
if the identity function is used,
must be enforced.
Alternatively, computational values of
can be set to a small positive value.
This option is named 'relu', due to its similarity to a ReLu function commonly used in machine learning.
Standard errors are computed by the square root of the diagonal elements of the inverse Hessian.
This function is implemented in 2 versions. The default runs on RCPP. An alternative version uses a Julia implementation which can be chosen by setting the argument julia to TRUE. In order to use this feature, a running Julia installation is required on the system. The RCPP implementation uses the derivative-free Nelder-Mead optimizer to obtain parameter estimates. The Julia implementation makes use of Julia's automatic differentiation in order to obtain gradients such that it can use the LBFGS algorithm for optimization. This enhances the numeric stability of the optimization and yields an internal validation if both methods yield qualitatively same parameter estimates. Furthermore, the Julia implementation can increase the computational speed significantly, especially for large models.
The model assessment tools cocoBoot
, cocoPit
, and cocoScore
will use a Julia implementation as well, if the cocoReg
was run with Julia.
Additionally, one can make the RCPP output of cocoReg
compatible with the Julia
model assessments by setting julia_installed to true. In this case, the user can choose
between the RCPP and the Julia implementation for model assessment.
an object of class coco. It contains the parameter estimates, standard errors, the log-likelihood, and information on the model specifications. If Julia is used for parameter estimation or the Julia installation parameter is set to TRUE, the results contain an additional Julia element that is called from the model Julia assessment tools if they are run with the Julia implementation.
Manuel Huth
Jung, R. C. and Tremayne, A. R. (2011) Convolution-closed models for count timeseries with applications. Journal of Time Series Analysis, 32, 3, 268–280.
Joe, H. (1996) Time series models with univariate margins in the convolution-closed infinitely divisible class. Journal of Applied Probability, 664–677.
## GP2 model without covariates length <- 1000 par <- c(0.5,0.2,0.05,0.3,0.3) data <- cocoSim(order = 2, type = "GP", par = par, length = length) fit <- cocoReg(order = 2, type = "GP", data = data) ##Poisson1 model with covariates length <- 1000 period <- 50 sin <- sin(2*pi/period*(1:length)) cos <- cos(2*pi/period*(1:length)) cov <- cbind(sin, cos) par <- c(0.2, 0.2, -0.2) data <- cocoSim(order = 1, type = "Poisson", par = par, xreg = cov, length = length) fit <- cocoReg(order = 1, type = "Poisson", data = data, xreg = cov)
## GP2 model without covariates length <- 1000 par <- c(0.5,0.2,0.05,0.3,0.3) data <- cocoSim(order = 2, type = "GP", par = par, length = length) fit <- cocoReg(order = 2, type = "GP", data = data) ##Poisson1 model with covariates length <- 1000 period <- 50 sin <- sin(2*pi/period*(1:length)) cos <- cos(2*pi/period*(1:length)) cov <- cbind(sin, cos) par <- c(0.2, 0.2, -0.2) data <- cocoSim(order = 1, type = "Poisson", par = par, xreg = cov, length = length) fit <- cocoReg(order = 1, type = "Poisson", data = data, xreg = cov)
Calculates the (Pearson) residuals of a fitted model for model evaluation purposes.
cocoResid(coco, val.num = 1e-10)
cocoResid(coco, val.num = 1e-10)
coco |
An object of class "coco |
val.num |
A non-negative real number which is used to stop the calculation if the cumulative probability reaches 1-'val.num' |
The Pearson residuals are computed as the scaled deviation of the observed count from its conditional expectation given the relevant past history, including covariates, if applicable. If a fitted model is correctly specified, the Pearson residuals should exhibit mean zero, variance one, and no significant serial correlation.
a list that includes the (Pearson) residuals, conditional expectations, conditional variances, and information on the model specifications.
Manuel Huth
The function calculates the log, quadratic and ranked probability scores for assessing relative performance of a fitted model as proposed by Czado et al. (2009).
cocoScore(coco, max_x = 50, julia = FALSE)
cocoScore(coco, max_x = 50, julia = FALSE)
coco |
An object of class coco |
max_x |
An integer which is used as the maximum count for the computation of the score. The default value is '50' |
julia |
if TRUE, the scores are computed with Julia. |
Scoring rules assign a numerical score based on the predictive distribution and the observed data to measure the quality of probabilistic predictions. They are provided here as a model selection tool and are computed as averages over the relevant set of (in-sample) predictions. Scoring rules are, generally, negatively oriented penalties that one seeks to minimize. The literature has developed a large number of scoring rules and, unless there is a unique and clearly defined underlying decision problem, there is no automatic choice of a (proper) scoring rule to be used in any given situation. Therefore, the use of a variety of scoring rules may be appropriate to take advantage of specific emphases and strengths. Three proper scoring rules (for a definition of the concept of propriety see Gneiting and Raftery, 2007) which Jung, McCabe and Tremayne (2016) found to be particularly useful are implemented. For more information see the references listed below.
a list containing the log score, quadratic score and ranked probability score.
Manuel Huth
Czado, C. and Gneitling, T. and Held, L. (2009) Predictive Model Assessment for Count Data. Biometrics, 65, 4, 1254–1261.
Gneiting, T. and Raftery, A. E. (2007) Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102:359-378.
Jung, Robert C., Brendan P. M. McCabe, and Andrew R. Tremayne. (2016). Model validation and diagnostics. In Handbook of Discrete Valued Time Series. Edited by Richard A. Davis, Scott H. Holan, Robert Lund and Nalini Ravishanker. Boca Raton: Chapman and Hall, pp. 189–218.
Jung, R. C. and Tremayne, A. R. (2011) Convolution-closed models for count timeseries with applications. Journal of Time Series Analysis, 32, 3, 268–280.
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) #julia_installed = TRUE ensures that the fit object #is compatible with the julia cocoScore implementation fit <- cocoReg(order = 1, type = "Poisson", data = data) #assessment using scoring rules - R implementation score_r <- cocoScore(fit)
lambda <- 1 alpha <- 0.4 set.seed(12345) data <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) #julia_installed = TRUE ensures that the fit object #is compatible with the julia cocoScore implementation fit <- cocoReg(order = 1, type = "Poisson", data = data) #assessment using scoring rules - R implementation score_r <- cocoScore(fit)
The function generates a time series of low counts from the (G)PAR model class for a specified innovation distribution, sample size, lag order, and parameter values.
cocoSim( type, order, par, length, xreg = NULL, init = NULL, julia = FALSE, julia_seed = NULL, link_function = "log" )
cocoSim( type, order, par, length, xreg = NULL, init = NULL, julia = FALSE, julia_seed = NULL, link_function = "log" )
type |
character, either "Poisson" or "GP" indicating the type of the innovation distribution |
order |
integer, either 1 or 2 indicating the order of the model |
par |
numeric vector, the parameters of the model, the number of elements in the vector depends on the type and order specified. |
length |
integer, the number of observations in the generated time series |
xreg |
data.frame, data frame of control variables |
init |
numeric vector, initial data to use, default is NULL. See details for more information on the usage. |
julia |
If TRUE, the Julia implementation is used. In this case, init is ignored but it might be faster. |
julia_seed |
Seed for the Julia implementation. Only used if Julia equals TRUE. |
link_function |
Specifies the link function for the conditional mean of the innovation ( |
The function checks for valid input of the type, order, parameters, and initial data before generating the time series.
The init parameter allows users to set a custom burn-in period
for the simulation. By default, when simulating with covariates, no burn-in
period is specified since there is no clear choice on the covariates.
However, the init argument gives users the flexibility to select an
appropriate burn-in period for the covariate case. One way to do this is to
simulate a time series using cocoSim
with appropriate covariates and pass the
resulting time series to the
init argument of a new cocoSim
run so that the first time series is used as
the burn-in period.
If init is not specified for the covariate case, a warning will be returned
to prompt the user to specify a custom burn-in period. This helps ensure that
the simulation accurately captures the dynamics of the system being modeled.
a vector of the simulated time series.
Manuel Huth
lambda <- 1 alpha <- 0.4 set.seed(12345) # Simulate using the RCPP implementation data_rcpp <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) # Simulate using the Julia implementation data_julia <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100)
lambda <- 1 alpha <- 0.4 set.seed(12345) # Simulate using the RCPP implementation data_rcpp <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100) # Simulate using the Julia implementation data_julia <- cocoSim(order = 1, type = "Poisson", par = c(lambda, alpha), length = 100)
This function computes and returns scores for Poisson and Generalized Poisson models.
cocoSoc(data, models = "all", print.progress = TRUE, max_x_score = 50, ...)
cocoSoc(data, models = "all", print.progress = TRUE, max_x_score = 50, ...)
data |
A numeric vector containing the data to be used for modeling. |
models |
A character string specifying which models to use. Default is '"all"', which uses both Poisson and GP models. |
print.progress |
A logical value indicating whether to print progress messages. Default is 'TRUE'. |
max_x_score |
An integer which is used as the maximum count for the computation of the score. The default value is '50' |
... |
Additional arguments to be passed to the 'cocoReg' function. |
A list of class '"cocoVarsoc"' containing:
A list of fitted model objects.
A list of score objects for each model.
A data frame containing the logarithmic, quadratic, and ranked probability scores for each model.
Monthly counts of claimants collecting wage loss benefit for injuries in the workplace at one specific service delivery location of the Workers Compensation Board of British Columbia, Canada in the period January 1985 to December 1994. Only injuries due to cuts and lacerations are considered. The data have been provided by Brendan McCabe.
Freeland, R. K.; McCabe, B.P.M. (2004) Analysis of count data by means of the Poisson autoregressive model. Journal of Time Series Analysis, 25, 701–722.
The data represent the number of daily downloads of a TeX-editor between June 2006 and February 2007 and has a sample size of 267. The data have been provided by Christian Weiss.
Weiss, C.H. (2008) Thinning operations for modelling time series of counts – a survey. Advances in Statistical Analysis, 92, 319–341.
A sample of 370 counts of gold particles in a well-defined colloidal solution at equidistant points in time, originally published in Westgren (1916) and used in Jung and Tremanye (2006).
Jung, R.C.; Tremayne, A.R. (2006) Coherent forecasting in integer time series models. International Journal of Forecasting 22, 223–238
Westgren, A. (1916) Die Veraenderungsgeschwindigkeit der lokalen Teilchenkonzentration in kollioden Systemen (Erste Mitteilung). Arkiv foer Matematik, Astronomi och Fysik, 11, 1–24.
checks for needed Julia packages and installs them if not installed.
installJuliaPackages()
installJuliaPackages()
no return value, called to install Julia packages in Julia.
Computes the k-step ahead forecast using the models in the coconots package.
## S3 method for class 'coco' predict( object, k = 1, number_simulations = 1000, alpha = 0.05, simulate_one_step_ahead = FALSE, max = NULL, epsilon = 1e-08, xcast = NULL, decimals = 4, julia = FALSE, ... )
## S3 method for class 'coco' predict( object, k = 1, number_simulations = 1000, alpha = 0.05, simulate_one_step_ahead = FALSE, max = NULL, epsilon = 1e-08, xcast = NULL, decimals = 4, julia = FALSE, ... )
object |
An object that has been fitted previously, of class coco. |
k |
The number of steps ahead for which the forecast should be computed. Defaults to 3. |
number_simulations |
The number of simulation runs to compute. Defaults to 500. |
alpha |
Level of confidence that is used to construct the prediction intervals. |
simulate_one_step_ahead |
If FALSE, the one-step ahead prediciton is obtained using the analytical distribution. If TRUE, bootstrapping is used. |
max |
The maximum number of the forecast support for the plot. If NULL all values for which the cumulative distribution function is below 1- epsilon are used for the plot. |
epsilon |
If max is NULL, epsilon determines the range of the support that is used by subsequent automatic plotting using R's plot() function. |
xcast |
An optional matrix of covariate values for the forecasting. If 'NULL', the function assumes no covariates. |
decimals |
Number of decimal places for the forecast probabilities |
julia |
if TRUE, the estimate is predicted with Julia. |
... |
Optional arguments. |
Returns forecasts for each mass point of the k-step ahead distribution for the fitted model. The exact predictive distributions for one-step ahead predicitons for the models included here are provided in Jung and Tremayne (2011), maximum likelihood estimates replace the true model parameters. Out-of-sample values for covariates can be provided, if necessary.
A list of frequency tables. Each table represents a k-step ahead forecast frequency distribution based on the simulation runs.
Sets the seed for Julia's random number generator to ensure reproducibility.
setJuliaSeed(julia_seed)
setJuliaSeed(julia_seed)
julia_seed |
An integer seed value to be passed to Julia's random number generator. |
This function initializes the necessary Julia functions and sets the random seed for Julia. If the provided seed is NULL, the function does nothing.
Your Name (or the appropriate author's name)