Lecture

Self-Attention

In the sentence "I ate an apple," how does the word “ate” determine which other words to focus on?

Self-Attention is a method where words in a sentence compare with one another to determine which ones they should “attend to.”

It calculates the relationships between words numerically, showing how much attention each word should give to the others.

For example, the attention score given by "ate" to each word can be derived as follows.

Self-Attention Example
I → 0.1 an → 0.1 apple → 0.8 ate → 0.1

The word "ate" is focused on "apple" because identifying "what was eaten" is important.

Through this process, transformers can understand relationships between words in a sentence and comprehend the context.

WordWord to Attend ToReason
INone or appleSubject but no strong link
appleateObject-verb relationship
ateappleIndicates "what was eaten?"

Traditional RNNs processed words one at a time, making it hard to understand relationships between words that are far apart in a sentence.

In contrast, Self-Attention examines all word pairs simultaneously, taking the whole sentence context into account.

In the next lesson, we will explore the Multi-Head Attention structure, which uses this Self-Attention mechanism in parallel multiple times.

Quiz
0 / 1

The self-attention mechanism determines how much each word in a sentence should focus on other words.

True
False

Lecture

AI Tutor

Design

Upload

Notes

Favorites

Help