temporal
The temporal
package provides a powerful and flexible implementation of the Transformer architecture, tailored specifically for time series forecasting. Here’s a deep dive into its key capabilities.
At the heart of every temporal
model are three core modules that work together to process your time series data.
temporal
provides a variety of embedding options, including:
TimeSeriesValueEmbedding
: A simple linear projection.TimeSeriesPatchEmbedding
: For patch-based models.SinusoidalPositionalEmbedding
, RotaryPositionalEmbedding
, and LearnedAbsolutePositionalEmbedding
.TimeSeriesTransformerEncoder
is a stack of encoder layers, each of which contains a self-attention mechanism and a feed-forward network.TimeSeriesTransformerDecoder
is a stack of decoder layers, each of which contains a self-attention mechanism, a cross-attention mechanism (for attending to the encoder’s output), and a feed-forward network.temporal
allows you to define a wide variety of Transformer architectures with ease.
TransformerArchitectureConfig
.temporal
provides a rich set of attention mechanisms beyond the standard self-attention.
FullAttention
: The standard, full self-attention mechanism.FlashAttention
: A highly efficient implementation that uses the flash-attn
library.LSEAttention
(Log-Sum-Exp Attention): A memory-efficient and numerically stable attention mechanism.DifferentialAttention
(DiffWist): A novel attention mechanism featuring a learnable gating mechanism and grouped-query attention.PatternedMultiHeadAttention
: This allows you to apply fixed, predefined patterns to the attention matrix, such as local, sliding, or dilated attention.HybridAttention
: A powerful feature that allows you to combine different attention mechanisms within a single layer.temporal
has extensive support for probabilistic forecasting, allowing you to model uncertainty in your predictions.
LinearOutputHead
: For simple point forecasts.GaussianHead
: For predicting the mean and standard deviation of a Gaussian distribution.QuantileRegressionOutputHead
: For directly predicting multiple quantiles.DistPredHead
: Designed for CRPS loss, this head outputs an ensemble of values to approximate the predictive distribution.MixtureOutputHead
: For Mixture Density Networks (MDNs).TimeFlowHead
: A specialized head for the TimeFlow model, a diffusion-based approach.The temporal
package is designed to be easily extensible.