lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

lesson16Title

lesson17Title

lesson18Title

lesson19Title

lesson20Title

lesson21Title

aiFundamentalsIntroChapter2Title

aiFundamentalsIntroChapter1Title

# Advanced Natural Language Processing Techniques with NLTK

In this lesson, we will explore advanced features such as `POS tagging`, `named entity recognition`, and `syntax parsing`using NLTK.

<br />

## 1. Part-of-Speech Tagging

A `part of speech` (POS) refers to the grammatical role of a word in a sentence.

For example, in `"I am a student."`, `I` is a pronoun, `am` is a verb, `a` is an article, and `student` is a noun.

POS tagging involves analyzing each word in a sentence to determine its part of speech.

```python title="POS Tagging Example"
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

nltk.download('averaged_perceptron_tagger')

text = "NLTK provides powerful NLP tools."
tokens = word_tokenize(text)
tagged = pos_tag(tokens)

print(tagged)
```

In the above code, `NNP` (proper noun), `VBZ` (verb, 3rd person singular present), `JJ` (adjective), etc., are the tags for each word indicating its part of speech.

<br />

## 2. Named Entity Recognition (NER)

`Named Entity Recognition` (NER) is the process of identifying specific entities such as people, organizations, and locations in a text.

```python title="Named Entity Recognition Example"
import numpy
from nltk.chunk import ne_chunk

nltk.download('maxent_ne_chunker')
nltk.download('words')

sentence = "I live in California."
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)
ner_tree = ne_chunk(tagged)

print(ner_tree)
```

The output appears as follows:

```
(S I/PRP live/VBP in/IN (GPE California/NNP) ./.)
```

Here, `GPE` indicates a geopolitical entity, and `NNP` signifies a proper noun.

<br />

## How About Other Languages?

NLTK is primarily an English-based natural language processing library, so its support for languages like Korean is limited.

For processing languages like Korean, it's common to use libraries such as `spaCy` or `KoNLPy` alongside NLTK.

```python title="KoNLPy Example"
from konlpy.tag import Okt

okt = Okt()
text = "Python makes natural language processing easy."

print(okt.morphs(text))  # Morphological analysis
print(okt.nouns(text))   # Extracting nouns
print(okt.pos(text))     # POS tagging
```

This code allows you to extract morphemes, identify nouns, and tag parts of speech in a Korean sentence.

While NLTK is excellent for English natural language processing, using other libraries is advisable for handling languages like Korean.

<br />

## References

- <a href="https://www.nltk.org/" target="_blank">NLTK Official Documentation</a>

- <a href="https://konlpy.org/en/latest/" target="_blank">KoNLPy Official Documentation</a>

POS tagging is the process of analyzing the grammatical role of a word, such as whether it is a noun, verb, adjective, etc.

### Which of the following words is most appropriate for the blank?