Self-Attention
In the sentence "I ate an apple," how does the word “ate” determine which other words to focus on?
Self-Attention
is a method where words in a sentence compare with one another to determine which ones they should “attend to.”
It calculates the relationships between words numerically, showing how much attention each word should give to the others.
For example, the attention score given by "ate" to each word can be derived as follows.
I → 0.1 an → 0.1 apple → 0.8 ate → 0.1
The word "ate" is focused on "apple" because identifying "what was eaten" is important.
Through this process, transformers can understand relationships between words in a sentence and comprehend the context.
Word | Word to Attend To | Reason |
---|---|---|
I | None or apple | Subject but no strong link |
apple | ate | Object-verb relationship |
ate | apple | Indicates "what was eaten?" |
Traditional RNNs
processed words one at a time, making it hard to understand relationships between words that are far apart in a sentence.
In contrast, Self-Attention
examines all word pairs simultaneously, taking the whole sentence context into account.
In the next lesson, we will explore the Multi-Head Attention
structure, which uses this Self-Attention mechanism in parallel multiple times.
The self-attention mechanism determines how much each word in a sentence should focus on other words.
Lecture
AI Tutor
Design
Upload
Notes
Favorites
Help