aiFundamentalsMachineLearningChapter1Desc

lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

lesson16Title

lesson17Title

lesson18Title

lesson19Title

lesson20Title

lesson21Title

lesson22Title

lesson23Title

aiFundamentalsMachineLearningChapter1Title

aiFundamentalsMachineLearningChapter2Desc

aiFundamentalsMachineLearningChapter2Title

aiFundamentalsMachineLearningChapter3Desc

lesson24Title

aiFundamentalsMachineLearningChapter3Title

aiFundamentalsMachineLearningChapter4Desc

aiFundamentalsMachineLearningChapter4Title

# Learning Through Rewards - Reinforcement Learning

`Reinforcement learning` is a method in which AI learns by performing actions within an environment and receiving rewards as a result.

Mathematically, the basic concepts of reinforcement learning are structured as follows:

1. *Agent*: The learning AI itself

2. *Environment*: The space where the AI operates

3. *Action*: The movements the AI can choose

4. *Reward*: An evaluation of how favorable the AI's actions are

5. *State*: The current situation the AI is experiencing

For example, using reinforcement learning to develop a game AI would proceed as follows:

| State | Action | Reward |
|--------------|--------|----------------|
| Obstacle visible | Jump | +1 (Success) |
| No obstacle | Jump | -1 (Unnecessary) |
| Did not jump | Hit obstacle | -10 (Failure) |

The AI gradually discovers better strategies through trial and error.

 

## Main Types of Reinforcement Learning

Reinforcement learning is largely categorized into two approaches.

 

### 1. Policy-Based Learning

This is a method where the AI directly **learns what actions to take**.

The AI learns how to choose the best action in a specific state, and when combined with deep learning, it can deliver powerful performance.

Examples of situations where the AI would learn to choose optimal actions include:

- A robotic arm learning optimal movements

- A game AI learning play strategies

- An autonomous vehicle optimizing its driving route

 

### 2. Value-Based Learning

This approach involves calculating the value of each action to **choose the one that maximizes rewards**.

The AI learns "How beneficial is this action?" and prioritizes actions that offer higher rewards.

Examples of situations where the AI would learn to choose optimal actions include:

- A chess AI learning how to find the best moves

- A stock trading AI learning strategies to maximize profit

- A logistics optimization AI learning the best delivery routes

 

## Limitations of Reinforcement Learning

Reinforcement learning is a powerful technique, but it comes with certain drawbacks.

### 1. It Takes a Long Time to Learn

Since the AI learns through trial and error, the learning process can be slow and requires a lot of data to be effective.

### 2. Reward Setting Can Be Challenging

If the rewards are set incorrectly, the AI might reinforce undesirable behaviors. For example, if an autonomous vehicle AI is rewarded solely for speed, it might ignore traffic signals.

### 3. Complexity in Real-World Application

While reinforcement learning is potent in simulation environments, real-world applications require consideration of complex variables and physical environments.

 

To address these limitations, reinforcement learning is increasingly combined with deep learning to enable more sophisticated learning and generalization.

In the next session, we'll tackle a simple quiz to review the material covered so far.

In reinforcement learning, the space where AI operates is called the `Environment`. Within the environment, AI attempts various actions and learns through the rewards it receives as a result.

### Which word is most appropriate in the blank?

State	Action	Reward
Obstacle visible	Jump	+1 (Success)
No obstacle	Jump	-1 (Unnecessary)
Did not jump	Hit obstacle	-10 (Failure)