lesson1Title

lesson2Title

lesson3Title

lesson4Title

lesson5Title

lesson6Title

lesson7Title

lesson8Title

lesson9Title

lesson10Title

lesson11Title

lesson12Title

lesson13Title

lesson14Title

lesson15Title

pythonDataAnalysisAdvancedChapter4Title

pythonDataAnalysisAdvancedChapter1Title

pythonDataAnalysisAdvancedChapter2Title

pythonDataAnalysisAdvancedChapter3Title

# Dataset Structure: Features and Labels

In machine learning, every dataset is divided into two main parts:

* `Features (X)` — input variables used by the model to make predictions (e.g., age, height, or number of purchases).
* `Labels (y)` — the target variable the model is trying to predict (e.g., whether an email is spam or the price of a house).

In supervised learning, the model learns the **relationship between features and labels** to make accurate predictions.

<br/>

## Loading a Dataset in `Scikit-learn`

`Scikit-learn` includes several built-in datasets for experimentation.
One of the most commonly used is the `Iris` dataset, which contains measurements of iris flower species.

```python title="Loading the Iris Dataset"
from sklearn.datasets import load_iris

iris = load_iris()

# Features (X) - shape: (samples, features)
X = iris.data
print("Feature shape:", X.shape)
print("First row of features:", X[0])

# Labels (y) - shape: (samples,)
y = iris.target
print("Label shape:", y.shape)
print("First label:", y[0])
```

<br/>

## Inspecting Feature and Label Names

You can check the *feature* and *target* names to understand what each column and label represents:

```python title="Feature and Label Names"
print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)
```

<br/>

The following are some key points about features and labels:

* `Features` are the information your model uses to make predictions.

* `Labels` define the correct answers during training.

* `X`: input features, 2D array shape `(n_samples, n_features)`.

* `y`: target labels, 1D array shape `(n_samples,)`.

Organizing data correctly into `X` and `y` is essential for Scikit-learn functions like `train_test_split()` and `.fit()`.
Proper separation of features and labels is the first step in preparing data for training.

In machine learning, features are the independent variables that provide input data to the model. They are crucial for model training as they contain the information needed to make accurate predictions. Properly identifying and organizing features is essential for effective data preprocessing and model building.

Dataset Structure: Features and Labels

Loading a Dataset in Scikit-learn

Inspecting Feature and Label Names

Understanding Dataset Structure

Loading a Dataset in `Scikit-learn`