mlpack 3.4.2
Public Member Functions | List of all members
MultiheadAttention< InputDataType, OutputDataType, RegularizerType > Class Template Reference

Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions. More...

#include <multihead_attention.hpp>

Public Member Functions

 MultiheadAttention ()
 Default constructor. More...
 
 MultiheadAttention (const size_t tgtSeqLen, const size_t srcSeqLen, const size_t embedDim, const size_t numHeads)
 Create the MultiheadAttention object using the specified modules. More...
 
OutputDataType & AttentionMask ()
 Modify the two dimensional Attention Mask. More...
 
OutputDataType const & AttentionMask () const
 Get the two dimensional Attention Mask. More...
 
template<typename eT >
void Backward (const arma::Mat< eT > &, const arma::Mat< eT > &gy, arma::Mat< eT > &g)
 Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f. More...
 
OutputDataType & Delta ()
 Modify the delta. More...
 
OutputDataType const & Delta () const
 Get the delta. More...
 
size_t & EmbedDim ()
 Modify the embedding dimension. More...
 
size_t EmbedDim () const
 Get the embedding dimension. More...
 
template<typename eT >
void Forward (const arma::Mat< eT > &input, arma::Mat< eT > &output)
 Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f. More...
 
OutputDataType & Gradient ()
 Modify the gradient. More...
 
OutputDataType const & Gradient () const
 Get the gradient. More...
 
template<typename eT >
void Gradient (const arma::Mat< eT > &input, const arma::Mat< eT > &error, arma::Mat< eT > &gradient)
 Calculate the gradient using the output delta and the input activation. More...
 
OutputDataType & KeyPaddingMask ()
 Modify the Key Padding Mask. More...
 
OutputDataType const & KeyPaddingMask () const
 Get Key Padding Mask. More...
 
size_t & NumHeads ()
 Modify the number of attention heads. More...
 
size_t NumHeads () const
 Get the number of attention heads. More...
 
OutputDataType & OutputParameter ()
 Modify the output parameter. More...
 
OutputDataType const & OutputParameter () const
 Get the output parameter. More...
 
OutputDataType & Parameters ()
 Modify the parameters. More...
 
OutputDataType const & Parameters () const
 Get the parameters. More...
 
void Reset ()
 Reset the layer parameters. More...
 
template<typename Archive >
void serialize (Archive &ar, const unsigned int)
 Serialize the layer. More...
 
size_t & SrcSeqLen ()
 Modify the source sequence length. More...
 
size_t SrcSeqLen () const
 Get the source sequence length. More...
 
size_t & TgtSeqLen ()
 Modify the target sequence length. More...
 
size_t TgtSeqLen () const
 Get the target sequence length. More...
 

Detailed Description

template<typename InputDataType = arma::mat, typename OutputDataType = arma::mat, typename RegularizerType = NoRegularizer>
class mlpack::ann::MultiheadAttention< InputDataType, OutputDataType, RegularizerType >

Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions.

With a single attention head, averaging inhibits this. [arxiv.org:1706.03762v5]

The MultiheadAttention class takes concatenated form of query, key and value. The query, key and value are concatenated into single matrix and fed to the Forward function as input.

The query, key and value are matrices of shapes (embedDim * tgtSeqLen, batchSize), (embedDim * srcSeqLen, batchSize) and (embedDim * srcSeqLen, batchSize) respectively. The output is a matrix of shape (embedDim * tgtSeqLen, batchSize). The embeddings are stored consequently.

Template Parameters
InputDataTypeType of the input data (arma::colvec, arma::mat, arma::sp_mat or arma::cube).
OutputDataTypeType of the output data (arma::colvec, arma::mat, arma::sp_mat or arma::cube).
RegularizerTypeType of the regularizer to be used.

Definition at line 62 of file multihead_attention.hpp.

Constructor & Destructor Documentation

◆ MultiheadAttention() [1/2]

Default constructor.

◆ MultiheadAttention() [2/2]

MultiheadAttention ( const size_t  tgtSeqLen,
const size_t  srcSeqLen,
const size_t  embedDim,
const size_t  numHeads 
)

Create the MultiheadAttention object using the specified modules.

Parameters
tgtSeqLenTarget sequence length.
srcSeqLenSource sequence length.
embedDimTotal dimension of the model.
numHeadsNumber of parallel attention heads.

Member Function Documentation

◆ AttentionMask() [1/2]

OutputDataType & AttentionMask ( )
inline

Modify the two dimensional Attention Mask.

Definition at line 152 of file multihead_attention.hpp.

◆ AttentionMask() [2/2]

OutputDataType const & AttentionMask ( ) const
inline

Get the two dimensional Attention Mask.

Definition at line 150 of file multihead_attention.hpp.

◆ Backward()

void Backward ( const arma::Mat< eT > &  ,
const arma::Mat< eT > &  gy,
arma::Mat< eT > &  g 
)

Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f.

Using the results from the feed forward pass.

Parameters
gyThe backpropagated error.
gThe calculated gradient.

◆ Delta() [1/2]

OutputDataType & Delta ( )
inline

Modify the delta.

Definition at line 167 of file multihead_attention.hpp.

◆ Delta() [2/2]

OutputDataType const & Delta ( ) const
inline

Get the delta.

Definition at line 165 of file multihead_attention.hpp.

◆ EmbedDim() [1/2]

size_t & EmbedDim ( )
inline

Modify the embedding dimension.

Definition at line 142 of file multihead_attention.hpp.

◆ EmbedDim() [2/2]

size_t EmbedDim ( ) const
inline

Get the embedding dimension.

Definition at line 140 of file multihead_attention.hpp.

◆ Forward()

void Forward ( const arma::Mat< eT > &  input,
arma::Mat< eT > &  output 
)

Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f.

Parameters
inputThe query matrix.
outputResulting output activation.

◆ Gradient() [1/3]

OutputDataType & Gradient ( )
inline

Modify the gradient.

Definition at line 172 of file multihead_attention.hpp.

◆ Gradient() [2/3]

OutputDataType const & Gradient ( ) const
inline

Get the gradient.

Definition at line 170 of file multihead_attention.hpp.

◆ Gradient() [3/3]

void Gradient ( const arma::Mat< eT > &  input,
const arma::Mat< eT > &  error,
arma::Mat< eT > &  gradient 
)

Calculate the gradient using the output delta and the input activation.

Parameters
inputThe input data used for evaluating specified function.
errorThe calculated error.
gradientThe calculated gradient.

◆ KeyPaddingMask() [1/2]

OutputDataType & KeyPaddingMask ( )
inline

Modify the Key Padding Mask.

Definition at line 157 of file multihead_attention.hpp.

◆ KeyPaddingMask() [2/2]

OutputDataType const & KeyPaddingMask ( ) const
inline

Get Key Padding Mask.

Definition at line 155 of file multihead_attention.hpp.

◆ NumHeads() [1/2]

size_t & NumHeads ( )
inline

Modify the number of attention heads.

Definition at line 147 of file multihead_attention.hpp.

◆ NumHeads() [2/2]

size_t NumHeads ( ) const
inline

Get the number of attention heads.

Definition at line 145 of file multihead_attention.hpp.

◆ OutputParameter() [1/2]

OutputDataType & OutputParameter ( )
inline

Modify the output parameter.

Definition at line 162 of file multihead_attention.hpp.

◆ OutputParameter() [2/2]

OutputDataType const & OutputParameter ( ) const
inline

Get the output parameter.

Definition at line 160 of file multihead_attention.hpp.

◆ Parameters() [1/2]

OutputDataType & Parameters ( )
inline

Modify the parameters.

Definition at line 177 of file multihead_attention.hpp.

◆ Parameters() [2/2]

OutputDataType const & Parameters ( ) const
inline

Get the parameters.

Definition at line 175 of file multihead_attention.hpp.

◆ Reset()

void Reset ( )

Reset the layer parameters.

◆ serialize()

void serialize ( Archive &  ar,
const unsigned int   
)

Serialize the layer.

◆ SrcSeqLen() [1/2]

size_t & SrcSeqLen ( )
inline

Modify the source sequence length.

Definition at line 137 of file multihead_attention.hpp.

◆ SrcSeqLen() [2/2]

size_t SrcSeqLen ( ) const
inline

Get the source sequence length.

Definition at line 135 of file multihead_attention.hpp.

◆ TgtSeqLen() [1/2]

size_t & TgtSeqLen ( )
inline

Modify the target sequence length.

Definition at line 132 of file multihead_attention.hpp.

◆ TgtSeqLen() [2/2]

size_t TgtSeqLen ( ) const
inline

Get the target sequence length.

Definition at line 130 of file multihead_attention.hpp.


The documentation for this class was generated from the following files: