Capital markets are complex and dynamic environments that generate large amounts of time series data at a high frequency. Understanding and modeling the hidden dynamics within capital markets using a data-driven approach is an exciting but challenging task. At its core, time series modeling and forecasting can be expressed as the generation or prediction of ${y}_t \in \mathbb{R}^P$ given previous observations ${y}_{<t} = {\{y}_s \in \mathbb{R}^P: s < t\}$, where ${t}$ and ${s}$ index time. A time series model (for forecasting) is thus essentially a model of the conditional distribution $p({y}_t | {y}_{<t})$. Here we assume discrete time, but the concepts we discuss can be extended to continuous time as well. Figure 1 shows an illustration of (multi-step) time series forecasting, where we predict the next ${H}$ observations given the previous observations from ${t = 1}$ to ${T}$.

However, different from typical time series modeling and forecasting, capital markets involve many participants that act and compete in a shared environment, where the environment as a whole is also heavily influenced by external events, such as news and politics. These complications make the generative process of the time series non-stationary — $p({y}_t|{y}_{<t})$ can undergo a substantial stochastic change over time. This causes significant challenges not only for learning a consistent model from the data but also for applying any learned model at test time effectively and securely.

Stationarity, Non-Stationarity, and Their Nuances

Non-stationarity generally means that the distribution of the data generative process can change over time. It is a problem we may face in time series modeling not only in capital markets but also in other real-world applications. Existing definitions of different types of stationarity (and non-stationarity) include, e.g., strong stationarity and weak stationarity, but we argue that there are important nuances in defining (non-)stationarity in a way that is relevant and useful for time series modeling and forecasting.

We are interested in modeling $p({y})$, where ${y}$ is the target time series, and typically also assume that the distribution of ${y}_t$ can depend on an input variable (feature) ${x}_t \in \mathbb{R}^Q$ that we observe at each time step ${t}$. A com- mon assumption when modeling the resulting conditional distribution $p({y} | {x})$ for the purpose of forecasting is that, given ${x}_t$, ${y}_t$ does not depend on other ${x}_{s}$, ${s} \neq {t}$, such that $p({y} | {x})$ factorizes as follows:

\begin{equation}
p({y} | {x}) = \prod_{t=1}^T p({y}_t | {y}_{< t}, {x}_{t}).
\tag1
\end{equation}

Based on the above equation, we can categorize non-stationarity of y into the following cases:
1. Neither $p({y}_t | {y}_{< t}, {x}_{t})$ nor $p({x}_t)$ changes, but ${y}$ is still not stationary (joint non-stationarity).

2. $p({x}_t)$ changes (covariate non-stationarity).

3. $p({y}_t | {y}_{< t}, {x}_t)$ changes (conditional non-stationarity).

Figure 1. The conditional distribution $p({y}_{T+1:T+H} | {y}_{1:T})$ is the core of time series modeling and forecasting.

Figure 1. The conditional distribution $p({y}_{T+1:T+H} | {y}_{1:T})$ is the core of time series modeling and forecasting.

4. A combination of 2 and 3, where both the covariate distribution $p({x}_t)$ and the conditional distribution $p({y}_t | {y}_{< t}, {x}_{t})$ change.

When we say $p({y}_t | {y}_{< t}, {x}_{t})$ does not change, we assume the following:

Assumption 1 ${y}_t$ only depends on a bounded history of ${y}$ for all ${t}$. That is, there exists ${B \in \mathbb{Z}, 0 \le B <\infty}$, such that for all time ${t}$,

\begin{equation}
p({y}_t | {y}_{<t}, {x}_{t}) = p({y}_t | {y}_{t-B:t-1}, {x}_{t}).
\tag2
\end{equation}

When this assumption is violated, $p({y}_t | {y}_{< t}, {x}_{t})$ has a different dependency structure at each ${t}$, so it changes by definition. Figure 2 illustrates this assumption compared to Figure 1.

It is worth noting that some widely-studied non-stationary stochastic processes, such as random walks or, more generally, unit-root processes, fall into the class of joint non-stationarity, where ${x} = \emptyset$ and the conditional distribution $p({y}_t | y_{< t})$ stays the same.

Existing Solutions

Assumption 1 separates two types of time series model families: those for which Assumption 1 is true and those for which it is not.

The latter class includes popular architectures like State-Space Models (SSMs) [1] and different incarnations of Recurrent Neural Networks (RNNs), such as those with Long Short-Term Memory (LSTM) [2] or Gated Recurrent Unit (GRU) [3]. In theory, these models do have the capability to model conditional non-stationarity, within the limits of their own recursive structural assumptions on the latent state, but they do not have any explicit inductive bias built into the model to account for non-stationarity. In practice, this means they still tend to suffer from non-stationarity, which can be present in either the training or test data.

In the class of models that satisfy Assumption 1 we have, among others, Autoregressive (AR) models [4], Temporal Convolutional Networks (TCNs) [5, 6], and more recently, Transformer variants [7, 8, 9] and N- BEATS [10]. Along with Assumption 1 they usually also assume conditional stationarity as part of their inductive bias. With stronger stationarity assumptions, they tend to perform well on data that satisfy these assumptions, but if non-stationarity, especially conditional non-stationarity, is present in either the training or test data, it can cause significant issues for the model to learn robustly and predict accurately. Specifically, if the training data have conditional non-stationarity, the data would seem to have inconsistent input-output relationships, and the model would not be able to learn “the” correct relationship. If the conditional distribution in the test data is different from the training data, the model would learn a “wrong” relationship that does not apply to the test data.

Figure 2. The assumption of bounded dependency and stationarity of the conditional distribution is very common in existing time series forecasting models.

Figure 2. The assumption of bounded dependency and stationarity of the conditional distribution is very common in existing time series forecasting models.

Adjacent Research Areas

A number of research areas are tightly related to the challenge of non-stationarity in time series modeling. We outline some of them in the following paragraphs but note that existing methods in these areas, although related, usually cannot be applied directly to time series.

Covariate non-stationarity as defined above is similar to covariate shift in domain adaptation and transfer learning [11, 12], although in the latter, the data are usually not time series, so there is no dependency of the target variable on itself (from previous time steps). In a typical scenario, a reasonable amount of labeled training data sampled from a distribution (source domain) are available, but the test data are assumed to be from a different distribution (target domain), wherein some invariance, such as the conditional distribution of the output given the input, is preserved. The goal is to adapt the model trained from the source-domain data such that it works well on the target-domain data.

Continual learning [13, 14] is another adjacent area, where the model needs to learn and adapt to multiple tasks (input-output relationships) online. Usually the data for each task arrive sequentially, and the model needs to not only keep learning new tasks but also avoid forgetting older ones [15]. An interesting special case is Bayesian continual learning [16, 17], which combines continual learning with Bayesian deep learning. The difference from typical continual learning is that a prior distribution is defined over the parameters of a deep neural network, and the posterior distribution over the parameters is inferred continually after observing new samples.

Figure 3. An overview of our model.

Figure 3. An overview of our model.

Dynamic Adaptation to Distribution Shifts @ Borealis AI

For time series modeling and forecasting in capital markets, we believe that conditional non-stationarity is the most common and important type of non-stationarity. We take a different approach to dealing with conditional non-stationary in time series than existing models like RNNs. The core of our architecture is the clean decoupling of the time-variant (non-stationary) part and the time-invariant (stationary) part of the signal. The time-invariant part models a stationary conditional distribution, given some control variables, while the time-variant part focuses on modeling the changes in this conditional distribution over time through those control variables. Using this separation, we build a flexible time-invariant conditional model and make efficient inferences about how the model changes over time. At test time, our model takes both the uncertainty of the conditional distribution and the non-stationarity into account when making predictions and adapts to the changes in the conditional distribution over time in an online manner.

Figure 3 shows a high-level illustration of the model. At the center of the model is the conditional distribution at each time step $t$, which we assume to be parameterized by the output of a (non-linear) function $f_t({y}_{t-B:t-1}) = f(g({y}_{t-B:t-1}); \chi_t)$ that conditions on the past $B$ observations, modulated by the non-stationary control variable $\chi_t$ (here we omit ${x}_t$ for simplicity). The past observations are summarized into a fixed-length vector through a time-invariant encoder $g$, such as a multilayer perceptron (MLP). The control variable $\chi_t$ changes over time through a dynamic model whose parameters are learned from the data along with the parameters of $g$. To train the model, we use variational inference and maximize the evidence lower bound (ELBO), where the variational model can be a very flexible generative model, such as an inverse autoregressive flow (IAF) [24]. At test time, however, we use Rao-Blackwellized particle filters [25] to keep inferring the posterior of $\chi_t$ at each time step $t$, after observing ${y}_t$, and use Monte-Carlo sampling to make predictions of ${y}_{t+1:t+H}$.

Non-Stationarity in Other Projects @ Borealis AI

Borealis AI supports a broad spectrum of domains within RBC, and many of the principles above generalize to applications outside capital markets, e.g., Personal and Commercial Banking (P&CB). While there is no shared environment or direct competition, in this case, the complex nature of human behaviours, which changes over time due to internal evolution and/or external influence, makes flexible adaptions to (asynchronous) time series data generated by client (trans)actions equally relevant. Many of the solutions we develop in the Photon team generalize to these new application domains with small modifications.

You can read the full paper ‘DynaConF: Dynamic Forecasting of Non-Stationary
Time-Series’ by Siqi Liu and Andreas Lehrmann here.

References