Time Series Forecasting in Machine Learning

In a normal machine learning problem, we get an observation for which we predict an outcome, irrespective of the time factor. In some of these cases, future outcomes are being predicted, but that treats all the past observations equally, with little or no significant difference.

However, a time series dataset is entirely different. Time series tasks add a “time dimension”, and also have an explicit order of dependence between the observations. To put it simply: A time series is a sequence of observations taken sequentially in time.

Components of time-series data

Level: The baseline value of the series.
Trend: The behavior the series follow, i.e increasing or decreasing as per the scenario and conditions.
Seasonality: The repeating patterns or cycles of behavior over time.
Noise: The optional variations in data that cannot be explained by the model.

These constituent components can be combined in ways to provide the observed time series. For example, they may be added together to form a model as follows (though this isn’t always the case):

Time series data can be of two different types:

Univariate Time Series: Refers to time series data that consists of single (scalar) observations recorded over regular intervals of time. For example, data collected from a sensor measuring the temperature of a room every second. Therefore, each second, you will only have a one-dimensional value (the temperature).
Multivariate Time Series: Multiple variables varying over time recorded at regular intervals of time. For example, in a tri-axial accelerometer, there are three accelerations, one for each axis (x,y,z), and they vary simultaneously over time.

Time Series Forecasting Methods

A complete overview of the various classical methods that are used for forecasting time series data is provided below.

Autoregression (AR)

An autoregressive model is when a value from a time series is regressed on previous values from that same time series. Here, in this model, the next value is represented as a linear combination of all the previous timestamp values (also called lag values).

AutoRegressive model is denoted by AR(p), where p is called order of the model. For example, an AR(1) would be a “first order autoregressive process.” The outcome variable in a first order AR process at some point (t) is related only to time periods that are one period apart, i.e the value at (t-1).

Moving Average (MA)

Rather than using past values of the forecast variable as a linear combination as in a regression, a moving average model uses past forecast error terms (εt) by taking average of any subset of observations in this model. It should not be confused with taking the average of past values. In this model, the current deviation from the mean depends upon the past deviations from the mean. The moving average is extremely useful for forecasting long-term trends.

Moving average is denoted by MA(q), where q is the size of the moving average window, also called the order of the moving average.

Autoregressive Moving Average(ARMA)

An ARMA model is simply the merger between the above mentioned AR(p) and MA(q) models. It’s used to describe weakly stationary stochastic time series in terms of two polynomials, i.e one of autoregression and another of moving average. It takes advantage of AR(p) , which makes predictions using previous values of the dependent variable, and of MA(q), which makes predictions using the series mean and previous errors.

Autoregressive Integrated Moving Average (ARIMA)

ARIMA is a generalization of the above-mentioned, simpler AutoRegressive Moving Average, with the only difference being the notion of integration.

Briefly, the key aspects of this model are:

Autoregression. A model that uses the dependent relationship between an observation and some past lagged observations.
Integrated. Taking the difference between raw observations (e.g. subtracting one observation from another at the previous time step) in order to make the time series stationary.
Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

A model with a dth difference to fit and ARMA(p,q) is called an ARIMA model with process of order (p,d,q).

Stationary series

A non-stationary series can be made stationary by taking differencing. In general, a series that is stationary if differentiated d times is said to be integrated of order d. Therefore, a series that is stationary without differencing is said to be I(0).

The “I” in the ARIMA model stands for integrated. It’s a measure of how many nonseasonal differences are needed to achieve stationarity, and also the base of difference between ARMA and ARIMA.

Why do we need to assume a series to be stationary?

Standard techniques are invalid if the series is non-stationary.
Auto-correlation may result if the series is non-stationary.
May result in spurious regressions indicating a relationship with past values, when in reality no such relationship exists.

Time Series Forecasting in Machine Learning

A topic where time is the hero!