Neural Machine Translation - A beginning to Attention

#machinetranslation #attentionmodel #nlp #deeplearning #transformers

Himanshu Soni Oct 03 2020 · 3 min read
Share this

Pre-requisite to understand Neural Machine Translation

1.) Artificial Neural Network Basics

2.) RNN - LSTM or RNN - GRU

3.) Encoder Decoder

Problems with Encoder Decoder

1.) We need to bring down sentences of source Input into fixed length vector where This may lead to difficulty in handling long sentences resulting in loosing Information or We can also says that loosing context information instead we having Bi-directional RNN-LSTM cells.

2.) If We have test sentences longer than training corpus that basically reducing the performance as length of input sentences increases.

3.) Context of all Encoder RNN cells is consolidated and Output of Last Cell of Encoder have feeded to Decoder's first RNN Cells therefore Model not clearly able to understand which are the most significant words to predict target word.

To Overcome this Issue, An Attention cames into picture where we are trying to find out which of Input words of given sentence are responsible for generating each target output word.

Idea to solve this issue, Researchers came up with an Idea in 2016, to add a Neural Network between Encoder and Decoder which is responsible for finding out the most significant words or we can say focused words of Input Sentence to predict target output to achieve the concept of Attention or Self-Attention

Generalized Architecture

Figure.1 High Level Architectures of Traditional Encoder Decoder and Encoder Decoder with Attention

Generalized Representation of how Attention has played role in Hindi to English Translation:

 Deep Dive into Encoder Decoder with Attention Neural Network

Figure 3.

In this figure , We can see that their are 4 Inputs : { X1,X2,X3,X4 } is feeded to Bi-Directional LSTM Cells which is responsible for generating 4 Outputs : {O1,O2,O3,O4}.

Decoder RNN cells s i-th is computed as :

where ,

  • s of i - 1th is the context of previous Decoder's RNN cell
  • y of  i - 1th is the output of previous Decoder's RNN cell
  • c of i is the context vector which is produced from feedforward neural network.
  • Those outputs which are generated from Encoder Cells is the Input for the intermediate neural network with trainable parameter is alpha value denoted as { α }.So context vector denoted as { C } which is computed as weighted sum of Outputs of Encoder cells with Trainable alpha value .This equation depicts like basic neural network equation

    where ,

  • C is the context vector
  • α is the trainable weights
  • O is the outputs of Encoder Bi-Directional RNN LSTM cells 
  • Concept to be Cleared :

    In the figure 3. We can see that for getting context vector C2.The output of X2 denoted as O2 and its alpha weights gets nullified because Output O2 is not playing any significant role only the ouput {O1,O2,O3} has attention.Like this, same as happened with context vector C3 where only {O1 and O2} is responsible or have a focus for producing Context Vector C3

    To compute the value of α is : exponent of each e divided by submission of exponent every e value

    To compute the value of e is : This function is denoted as Attention Function which is responsible for finding out the most significant words to predict next target word


  • i-1 is the output of previous RNN cell of decoder
  • h j-th is the particular RNN cell of Encoder
  • Concept to be Cleared : Researcher parameterized α as feedforward neural network which is jointly trained with all the components of proposed system which allows the gradient of the cost function to be backpropagated through and this gradient can be used to train the value of α (when we compute value of α it is termed as Alignment Model) as well as whole translation model jointly, therefore it is termed as Jointly - Alignment Translation Model.

                                        Thank You for reading !

    Feel Free to add your Feedback and Connect with me on LinkedIn

    Happy Deep Learning !

    References :

    Neural Machine Translation by Jointly Learning to Align and Translate

    Special Thanks to

    Sudhanshu Kumar Sir.

    Chief AI Engineer and CEO of

    Read next