Introduction to dfms

Sebastian Krantz

2024-06-09

dfms provides a user friendly and computationally efficient approach to estimate linear Gaussian Dynamic Factor Models in R. The package is not geared at any specific application, and can be used for dimensionality reduction, forecasting and nowcasting systems of time series. The use of the package is facilitated by a comprehensive set of methods to explore/plot models and extract results.

library(dfms)
library(xts)

introduction.R

This vignette walks through the main features of the package. The data provided in the package in xts format is taken from Banbura and Modugno (2014)1, henceforth BM14, and covers the Euro Area from January 1980 through September 2009.

# Using the monthly series from BM14
dim(BM14_M)
#> [1] 357  92
range(index(BM14_M))
#> [1] "1980-01-31" "2009-09-30"
head(colnames(BM14_M))
#> [1] "ip_total"       "ip_tot_cstr"    "ip_tot_cstr_en" "ip_constr"     
#> [5] "ip_im_goods"    "ip_capital"
plot(scale(BM14_M), lwd = 1)

introduction.R

The data frame BM14_Models provides information about the series2, and the various models estimated by BM14.

head(BM14_Models, 3)
#>           series
#> 1       ip_total
#> 2    ip_tot_cstr
#> 3 ip_tot_cstr_en
#>                                                                                           label
#> 1                                       IP-Total Industry - Working Day and Seasonally Adjusted
#> 2              IP-Total Industry (Excluding Construction) - Working Day and Seasonally Adjusted
#> 3 IP-Total Industry Excluding Construction and MIG Energy - Working Day and Seasonally Adjusted
#>                           code freq log_trans small medium large
#> 1 sts.m.i5.Y.PROD.NS0010.4.000    M      TRUE FALSE  FALSE  TRUE
#> 2 sts.m.i5.Y.PROD.NS0020.4.000    M      TRUE  TRUE   TRUE  TRUE
#> 3 sts.m.i5.Y.PROD.NS0021.4.000    M      TRUE FALSE  FALSE  TRUE

# Using only monthly data
BM14_Models_M <- subset(BM14_Models, freq == "M")

introduction.R

Prior to estimation, all data is differenced by BM14, and some series are log, differenced, as indicated by the log_trans column in BM14_Models. In general, dfms uses a stationary Kalman Filter with time-invariant system matrices, and therefore expects data to be stationary. Data is also scaled and centered3 in the main DFM() function, thus this does not need to be done by the user.

library(magrittr)
# log-transforming and first-differencing the data
BM14_M[, BM14_Models_M$log_trans] %<>% log()
BM14_M_diff = diff(BM14_M)
plot(scale(BM14_M_diff), lwd = 1)

introduction.R

Determining the Structure of the Model

Before estimating a model, the ICr() function can be applied to determine the number of factors. It computes 3 information criteria proposed in Bai and NG (2002)4, whereby the second criteria generally suggests the most parsimonious model.

ic = ICr(BM14_M_diff)
#> Missing values detected: imputing data with tsnarmimp() with default settings
print(ic)
#> Optimal Number of Factors (r) from Bai and Ng (2002) Criteria
#> 
#> IC1 IC2 IC3 
#>   7   7  13
plot(ic)

introduction.R

Another option is to use a Screeplot to gauge the number of factors by looking for a kink in the plot. A mathematical procedure for finding the kink was suggested by Onatski (2010)5, but this is currently not implemented in ICr().

screeplot(ic)

introduction.R

Based on both the information criteria and the Screeplot, I gauge that a model with 4 factors should be estimated, as factors, 5, 6 and 7 do not add much to the explanatory power of the model. Next to the number of factors, the lag order of the factor-VAR of the transition equation should be estimated (the default is 1 lag). This can be done using the VARselect() function from the vars package, with PCA factor estimates reported by ICr().

# Using vars::VARselect() with 4 principal components to estimate the VAR lag order
vars::VARselect(ic$F_pca[, 1:4])
#> $selection
#> AIC(n)  HQ(n)  SC(n) FPE(n) 
#>      6      3      3      6 
#> 
#> $criteria
#>                 1          2          3          4          5          6
#> AIC(n)   5.810223   5.617282   5.427760   5.389413   5.407765   5.381829
#> HQ(n)    5.898758   5.776646   5.657953   5.690434   5.779614   5.824507
#> SC(n)    6.032560   6.017490   6.005838   6.145361   6.341582   6.493517
#> FPE(n) 333.696100 275.153456 227.671078 219.144228 223.265640 217.639133
#>                 7          8          9         10
#> AIC(n)   5.409877   5.394900   5.421375   5.460761
#> HQ(n)    5.923383   5.979235   6.076538   6.186753
#> SC(n)    6.699434   6.862328   7.066673   7.283929
#> FPE(n) 223.956824 220.793226 226.933863 236.331677

introduction.R

The selection thus suggests we should estimate a factor model with r = 4 factors and p = 3 lags6. Before estimating the model I note that dfms does not deal with seasonality in series, thus it is recommended to also seasonally adjust data, e.g. using the seasonal package before estimation. BM14 only use seasonally adjusted series, thus this is not necessary with the example data provided.

Estimation and Exploration

Estimation can then simply be done using the DFM() function with parameters r and p7.

# Estimating the model with 4 factors and 3 lags using BM14's EM algorithm
model1 = DFM(BM14_M_diff, r = 4, p = 3)
#> Converged after 26 iterations.
print(model1)
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#> 
#> Factor Transition Matrix [A]
#>      L1.f1   L1.f2   L1.f3   L1.f4   L2.f1   L2.f2   L2.f3   L2.f4   L3.f1
#> f1  0.4720 -0.1297  0.8460  0.2098 -0.0733 -0.1436 -0.0595  0.1565  0.2356
#> f2 -0.1612  0.1699  0.2389  0.1598  0.0641 -0.1341 -0.0542  0.1287  0.1336
#> f3  0.3965  0.3264  0.0213 -0.3033 -0.1542 -0.0467 -0.1484 -0.0150 -0.1172
#> f4  0.1096  0.1601 -0.1578  0.2485 -0.0365 -0.0563 -0.0230 -0.1117 -0.0719
#>      L3.f2   L3.f3   L3.f4
#> f1 -0.0803 -0.0386  0.0408
#> f2  0.1347 -0.0024 -0.0342
#> f3 -0.0087  0.1767  0.0249
#> f4  0.0307  0.0662 -0.0035
plot(model1)

introduction.R

The model can be investigated using summary(), which returns an object of class ‘dfm_summary’ containing the system matrices and summary statistics of the factors and the residuals in the measurement equation, as well as the R-Squared of the factor model for individual series. The print method automatically adjusts the amount of information printed to the data size. For large databases with more than 40 series, no series-level statistics are printed.

dfm_summary <- summary(model1)
print(dfm_summary) # Large model with > 40 series: defaults to compact = 2
#> Dynamic Factor Model: n = 92, T = 356, r = 4, p = 3, %NA = 25.8366
#> 
#> Call:  DFM(X = BM14_M_diff, r = 4, p = 3)
#> 
#> Summary Statistics of Factors [F]
#>       N     Mean   Median      SD       Min      Max
#> f1  356  -0.0448   0.3455  4.4505  -21.9265  11.0306
#> f2  356  -0.0319  -0.0892    2.68   -9.9549   7.4988
#> f3  356  -0.1032  -0.0593  3.2891  -12.0969  16.2455
#> f4  356  -0.0118    0.089   2.161   -8.2883  10.7219
#> 
#> Factor Transition Matrix [A]
#>      L1.f1   L1.f2    L1.f3   L1.f4    L2.f1    L2.f2    L2.f3    L2.f4   L3.f1
#> f1  0.4720 -0.1297  0.84605  0.2098 -0.07334 -0.14356 -0.05950  0.15645  0.2356
#> f2 -0.1612  0.1699  0.23889  0.1598  0.06406 -0.13413 -0.05415  0.12869  0.1336
#> f3  0.3965  0.3264  0.02128 -0.3033 -0.15424 -0.04669 -0.14839 -0.01495 -0.1172
#> f4  0.1096  0.1601 -0.15776  0.2485 -0.03655 -0.05626 -0.02304 -0.11169 -0.0719
#>        L3.f2     L3.f3     L3.f4
#> f1 -0.080320 -0.038592  0.040812
#> f2  0.134692 -0.002391 -0.034215
#> f3 -0.008694  0.176663  0.024876
#> f4  0.030716  0.066201 -0.003465
#> 
#> Factor Covariance Matrix [cov(F)]
#>           f1        f2        f3        f4
#> f1  19.8067    2.0846*  -3.4700*  -2.1094*
#> f2   2.0846*   7.1822   -2.8725*  -1.0631*
#> f3  -3.4700*  -2.8725*  10.8182    1.9286*
#> f4  -2.1094*  -1.0631*   1.9286*   4.6701 
#> 
#> Factor Transition Error Covariance Matrix [Q]
#>         u1      u2      u3      u4
#> u1  9.0178  0.3303 -3.0764 -1.0182
#> u2  0.3303  5.4425 -1.3095 -0.5051
#> u3 -3.0764 -1.3095  7.0230  0.8639
#> u4 -1.0182 -0.5051  0.8639  3.8005
#> 
#> Summary of Residual AR(1) Serial Correlations
#>    N     Mean   Median      SD      Min     Max
#>   92  -0.0409  -0.0782  0.2959  -0.5073  0.6858
#> 
#> Summary of Individual R-Squared's
#>    N    Mean  Median      SD     Min     Max
#>   92  0.3712   0.299  0.2888  0.0067  0.9978

# Can request more detailed printouts
# print(dfm_summary, compact = 1)
# print(dfm_summary, compact = 0) 

introduction.R

Apart from the model summary, the dfm methods residuals() and fitted() return observation residuals and fitted values from the model. The default format is a plain matrix, but the functions also have an argument to return data in the original (input) format.

plot(resid(model1, orig.format = TRUE))