Then to deal with this limitation, a new concept were introduced the attention mechanism. Therefore, if the decoder only accesses the last hidden state of the decoder, it will lose relevant information about the first elements of the sequence. In the encoder, the hidden state in every step is associated with a certain word in the input sentence, usually one of the most recent. These architectures have a great limitation when working with long sequences, their ability to retain information from the first elements was lost when new elements were incorporated into the sequence. In sequence-to-sequence problems such as the neural machine translation, the initial proposals were based on the use of RNNs in an encoder-decoder architecture. To read more about the problem that we will address and to know how the basic attention mechanism works, I recommend you to read my previous post “A Guide on the Encoder-Decoder Model and the Attention Mechanism”. We will describe the components of this model, analyze their operation and build a simple model that we will apply to a small-scale NMT problem (Neural Machine Translation). ![]() The most famous current models that are emerging in NLP tasks consist of dozens of transformers or some of their variants, for example, GPT-2 or BERT. This paper was a great advance in the use of the attention mechanism, being the main improvement for a model called Transformer. In this post we will describe and demystify the relevant artifacts in the paper “Attention is all you need” (Vaswani, Ashish & Shazeer, Noam & Parmar, Niki & Uszkoreit, Jakob & Jones, Llion & Gomez, Aidan & Kaiser, Lukasz & Polosukhin, Illia.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |