Perceptrón desde cero y en práctica: teoría, entrenamiento y límites

Este notebook está diseñado para entender en profundidad el perceptrón, uno de los modelos fundacionales del deep learning.

Objetivo didáctico

Al finalizar, deberías poder responder con claridad:

¿Qué es un perceptrón y cómo se define matemáticamente?
¿Qué tipo de problemas puede resolver bien?
¿Cuáles son sus limitaciones?
¿Cómo se relaciona con modelos modernos (MLP y redes profundas)?

También veremos varios experimentos prácticos, con métricas y visualizaciones.

Fundamentos matemáticos y computacionales

Un perceptrón clásico para clasificación binaria calcula:

$$ z = \mathbf{w}^ op \mathbf{x} + b $$

y aplica una activación escalón:

$$ \hat{y} = egin{cases} 1 & ext{si } z \ge 0
0 & ext{si } z < 0 \end{cases} $$

La frontera de decisión es lineal (una recta en 2D, hiperplano en dimensiones mayores).

Regla de actualización (idea del aprendizaje)

Si el perceptrón se equivoca en una muestra $(\mathbf{x}_i, y_i)$, ajusta pesos y sesgo para corregir la dirección:

$$ \mathbf{w} \leftarrow \mathbf{w} + \eta ,(y_i - \hat{y}_i),\mathbf{x}_i, \qquad b \leftarrow b + \eta ,(y_i - \hat{y}_i) $$

con $\eta$ como tasa de aprendizaje.

¿Dónde funciona bien?

Problemas linealmente separables.
Baselines rápidos y fáciles de interpretar.

¿Dónde falla?

Problemas no lineales (por ejemplo, XOR o datos con fronteras curvas).
Necesitamos capas ocultas y activaciones no lineales para superar este límite (MLP).

Qué veremos en este notebook

Implementación de perceptrón desde cero.
Experimento en datos linealmente separables (éxito esperado).
Experimento en datos no lineales tipo moons (limitación esperada).
Caso multiclase con Iris (One-vs-Rest con perceptrón lineal).
Perceptrón de salida lineal para regresión simple (conexión con regresión lineal).

Nota: para clasificación mostraremos curvas de loss y accuracy en train/val después de cada entrenamiento.

[1]

# Librerías y configuración global

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_blobs, make_moons, load_iris, make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, ConfusionMatrixDisplay,
    mean_squared_error, mean_absolute_error, r2_score
)
from sklearn.linear_model import LinearRegression

sns.set_theme(style='whitegrid', context='notebook')
plt.rcParams['figure.figsize'] = (8, 5)

SEED = 42
np.random.seed(SEED)

1) Funciones auxiliares

Definimos utilidades para entrenar y evaluar un perceptrón binario desde cero, y para pintar curvas.

[2]

def perceptron_predict_scores(X, w, b):
    """Devuelve score lineal z = Xw + b."""
    return X @ w + b


def perceptron_predict_labels(X, w, b):
    """Predicción binaria 0/1 con activación escalón."""
    z = perceptron_predict_scores(X, w, b)
    return (z >= 0).astype(int)


def perceptron_loss(y_true, y_pred):
    """Loss simple como tasa de error (1 - accuracy)."""
    return 1.0 - accuracy_score(y_true, y_pred)


def train_perceptron_binary(X_train, y_train, X_val, y_val, lr=0.05, epochs=60):
    """Entrena perceptrón binario con regla clásica y guarda historial."""
    n_features = X_train.shape[1]
    w = np.zeros(n_features)
    b = 0.0

    history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}

    for epoch in range(epochs):
        # Recorremos muestras (stochastic online updates)
        for xi, yi in zip(X_train, y_train):
            y_hat = 1 if (np.dot(w, xi) + b) >= 0 else 0
            update = lr * (yi - y_hat)
            w += update * xi
            b += update

        # Métricas al final de cada época
        train_pred = perceptron_predict_labels(X_train, w, b)
        val_pred = perceptron_predict_labels(X_val, w, b)

        history['train_acc'].append(accuracy_score(y_train, train_pred))
        history['val_acc'].append(accuracy_score(y_val, val_pred))
        history['train_loss'].append(perceptron_loss(y_train, train_pred))
        history['val_loss'].append(perceptron_loss(y_val, val_pred))

    return w, b, history


def plot_curves(train_values, val_values, title, ylabel):
    epochs = np.arange(1, len(train_values) + 1)
    plt.figure(figsize=(8, 4.5))
    plt.plot(epochs, train_values, label='Train')
    plt.plot(epochs, val_values, label='Validación')
    plt.title(title)
    plt.xlabel('Época')
    plt.ylabel(ylabel)
    plt.legend()
    plt.tight_layout()
    plt.show()

2) Experimento A — Datos linealmente separables

Primero probamos un caso ideal para perceptrón: dos clases separables por una recta.

[3]

# Generamos dataset separable
X_sep, y_sep = make_blobs(
    n_samples=1200,
    centers=2,
    n_features=2,
    cluster_std=1.1,
    random_state=SEED
)

df_sep = pd.DataFrame(X_sep, columns=['x1', 'x2'])
df_sep['target'] = y_sep

# EDA básico
print('Shape:', df_sep.shape)
display(df_sep.head())
print('\nBalance de clases:')
print(df_sep['target'].value_counts(normalize=True).sort_index())

plt.figure(figsize=(7, 5))
sns.scatterplot(data=df_sep, x='x1', y='x2', hue='target', alpha=0.6, palette='Set1')
plt.title('Dataset linealmente separable')
plt.tight_layout()
plt.show()

Shape: (1200, 3)

	x1	x2	target
0	2.556903	2.207793	1
1	4.778896	4.105992	1
2	-2.264850	8.487663	0
3	-4.393646	10.503546	0
4	-2.111398	8.304654	0

Balance de clases:
target
0    0.5
1    0.5
Name: proportion, dtype: float64

[4]

# Split train/val/test y escalado
X_train_full, X_test, y_train_full, y_test = train_test_split(
    X_sep, y_sep, test_size=0.15, random_state=SEED, stratify=y_sep
)
X_train, X_val, y_train, y_val = train_test_split(
    X_train_full, y_train_full, test_size=0.1765, random_state=SEED, stratify=y_train_full
)

scaler_sep = StandardScaler()
X_train_sc = scaler_sep.fit_transform(X_train)
X_val_sc = scaler_sep.transform(X_val)
X_test_sc = scaler_sep.transform(X_test)

print('Train:', X_train_sc.shape, 'Val:', X_val_sc.shape, 'Test:', X_test_sc.shape)

Train: (839, 2) Val: (181, 2) Test: (180, 2)

[5]

# Entrenamiento del perceptrón binario
w_sep, b_sep, hist_sep = train_perceptron_binary(
    X_train_sc, y_train, X_val_sc, y_val, lr=0.05, epochs=70
)

# Curvas de pérdida y accuracy
plot_curves(hist_sep['train_loss'], hist_sep['val_loss'], 'Perceptrón (separable) - Loss', 'Loss = 1 - accuracy')
plot_curves(hist_sep['train_acc'], hist_sep['val_acc'], 'Perceptrón (separable) - Accuracy', 'Accuracy')

[6]

# Evaluación en test
y_pred_sep = perceptron_predict_labels(X_test_sc, w_sep, b_sep)

metrics_sep = {
    'Accuracy': accuracy_score(y_test, y_pred_sep),
    'Precision': precision_score(y_test, y_pred_sep),
    'Recall': recall_score(y_test, y_pred_sep),
    'F1': f1_score(y_test, y_pred_sep)
}
print('Métricas test (separable):', metrics_sep)

# Matriz de confusión
fig, ax = plt.subplots(figsize=(4.5, 4))
ConfusionMatrixDisplay(confusion_matrix(y_test, y_pred_sep)).plot(ax=ax, cmap='Blues', colorbar=False)
plt.title('Confusion Matrix - Caso separable')
plt.tight_layout()
plt.show()

Métricas test (separable): {'Accuracy': 1.0, 'Precision': 1.0, 'Recall': 1.0, 'F1': 1.0}

[7]

# Visualizamos frontera de decisión en 2D
x_min, x_max = X_test_sc[:, 0].min() - 1, X_test_sc[:, 0].max() + 1
y_min, y_max = X_test_sc[:, 1].min() - 1, X_test_sc[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 250), np.linspace(y_min, y_max, 250))
grid = np.c_[xx.ravel(), yy.ravel()]
zz = perceptron_predict_labels(grid, w_sep, b_sep).reshape(xx.shape)

plt.figure(figsize=(7, 5))
plt.contourf(xx, yy, zz, alpha=0.25, cmap='coolwarm')
plt.scatter(X_test_sc[:, 0], X_test_sc[:, 1], c=y_test, s=20, alpha=0.8, cmap='coolwarm')
plt.title('Frontera lineal aprendida por el perceptrón')
plt.xlabel('x1 (escalado)')
plt.ylabel('x2 (escalado)')
plt.tight_layout()
plt.show()

3) Experimento B — Datos no lineales (moons)

Ahora usamos un problema con frontera curva. Este caso ilustra una limitación clásica del perceptrón de una sola capa.

[8]

# Generamos dataset no lineal
X_moons, y_moons = make_moons(n_samples=1600, noise=0.25, random_state=SEED)

df_moons = pd.DataFrame(X_moons, columns=['x1', 'x2'])
df_moons['target'] = y_moons

print('Shape:', df_moons.shape)
display(df_moons.head())

plt.figure(figsize=(7, 5))
sns.scatterplot(data=df_moons, x='x1', y='x2', hue='target', alpha=0.6, palette='Set1')
plt.title('Dataset make_moons (no lineal)')
plt.tight_layout()
plt.show()

Shape: (1600, 3)

	x1	x2	target
0	-0.574865	0.711225	0
1	0.166344	0.901561	0
2	0.701863	0.890406	0
3	1.001693	0.330256	0
4	0.002722	0.109953	1

[9]

# Split y escalado
Xm_train_full, Xm_test, ym_train_full, ym_test = train_test_split(
    X_moons, y_moons, test_size=0.15, random_state=SEED, stratify=y_moons
)
Xm_train, Xm_val, ym_train, ym_val = train_test_split(
    Xm_train_full, ym_train_full, test_size=0.1765, random_state=SEED, stratify=ym_train_full
)

scaler_m = StandardScaler()
Xm_train_sc = scaler_m.fit_transform(Xm_train)
Xm_val_sc = scaler_m.transform(Xm_val)
Xm_test_sc = scaler_m.transform(Xm_test)

# Entrenamiento
w_m, b_m, hist_m = train_perceptron_binary(Xm_train_sc, ym_train, Xm_val_sc, ym_val, lr=0.04, epochs=90)

plot_curves(hist_m['train_loss'], hist_m['val_loss'], 'Perceptrón (moons) - Loss', 'Loss = 1 - accuracy')
plot_curves(hist_m['train_acc'], hist_m['val_acc'], 'Perceptrón (moons) - Accuracy', 'Accuracy')

[10]

# Evaluación del caso no lineal
ym_pred = perceptron_predict_labels(Xm_test_sc, w_m, b_m)

metrics_m = {
    'Accuracy': accuracy_score(ym_test, ym_pred),
    'Precision': precision_score(ym_test, ym_pred),
    'Recall': recall_score(ym_test, ym_pred),
    'F1': f1_score(ym_test, ym_pred)
}
print('Métricas test (moons):', metrics_m)

fig, ax = plt.subplots(figsize=(4.5, 4))
ConfusionMatrixDisplay(confusion_matrix(ym_test, ym_pred)).plot(ax=ax, cmap='Oranges', colorbar=False)
plt.title('Confusion Matrix - Caso no lineal')
plt.tight_layout()
plt.show()

Métricas test (moons): {'Accuracy': 0.8291666666666667, 'Precision': 0.8264462809917356, 'Recall': 0.8333333333333334, 'F1': 0.8298755186721992}

[11]

# Frontera en moons: el perceptrón sigue siendo lineal
x_min, x_max = Xm_test_sc[:, 0].min() - 1, Xm_test_sc[:, 0].max() + 1
y_min, y_max = Xm_test_sc[:, 1].min() - 1, Xm_test_sc[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 250), np.linspace(y_min, y_max, 250))
grid = np.c_[xx.ravel(), yy.ravel()]
zz = perceptron_predict_labels(grid, w_m, b_m).reshape(xx.shape)

plt.figure(figsize=(7, 5))
plt.contourf(xx, yy, zz, alpha=0.25, cmap='coolwarm')
plt.scatter(Xm_test_sc[:, 0], Xm_test_sc[:, 1], c=ym_test, s=20, alpha=0.85, cmap='coolwarm')
plt.title('Limitación del perceptrón en frontera no lineal')
plt.xlabel('x1 (escalado)')
plt.ylabel('x2 (escalado)')
plt.tight_layout()
plt.show()

4) Experimento C — Perceptrón multiclase (Iris, One-vs-Rest)

Aunque el perceptrón nació para binario, podemos extenderlo a multiclase con estrategia One-vs-Rest (OvR): una neurona lineal por clase.

[12]

# Carga y EDA breve de Iris
iris = load_iris(as_frame=True)
df_iris = iris.frame.copy()
df_iris['target_name'] = df_iris['target'].map(dict(enumerate(iris.target_names)))

print('Shape:', df_iris.shape)
display(df_iris.head())

plt.figure(figsize=(8, 5))
sns.scatterplot(
    data=df_iris,
    x='petal length (cm)',
    y='petal width (cm)',
    hue='target_name',
    palette='Set2',
    s=60
)
plt.title('Iris: vista rápida de separabilidad')
plt.tight_layout()
plt.show()

Shape: (150, 6)

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)	target_name
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

[13]

def train_perceptron_ovr(X_train, y_train, X_val, y_val, lr=0.03, epochs=80):
    """Entrena One-vs-Rest con reglas de perceptrón binarias por clase."""
    classes = np.unique(y_train)
    n_classes = len(classes)
    n_features = X_train.shape[1]

    W = np.zeros((n_classes, n_features))
    b = np.zeros(n_classes)

    history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}

    for epoch in range(epochs):
        for xi, yi in zip(X_train, y_train):
            # Score para cada clase y predicción multiclase
            scores = W @ xi + b
            pred = np.argmax(scores)

            # Actualización simple tipo perceptrón multiclase
            if pred != yi:
                W[yi] += lr * xi
                b[yi] += lr
                W[pred] -= lr * xi
                b[pred] -= lr

        # Métricas por época
        train_pred = np.argmax(X_train @ W.T + b, axis=1)
        val_pred = np.argmax(X_val @ W.T + b, axis=1)

        train_acc = accuracy_score(y_train, train_pred)
        val_acc = accuracy_score(y_val, val_pred)

        history['train_acc'].append(train_acc)
        history['val_acc'].append(val_acc)
        history['train_loss'].append(1 - train_acc)  # error de clasificación
        history['val_loss'].append(1 - val_acc)

    return W, b, history

[14]

# Split y escalado para Iris
X_i = iris.data.values
y_i = iris.target.values

Xi_train_full, Xi_test, yi_train_full, yi_test = train_test_split(
    X_i, y_i, test_size=0.2, random_state=SEED, stratify=y_i
)
Xi_train, Xi_val, yi_train, yi_val = train_test_split(
    Xi_train_full, yi_train_full, test_size=0.25, random_state=SEED, stratify=yi_train_full
)

scaler_i = StandardScaler()
Xi_train_sc = scaler_i.fit_transform(Xi_train)
Xi_val_sc = scaler_i.transform(Xi_val)
Xi_test_sc = scaler_i.transform(Xi_test)

W_i, b_i, hist_i = train_perceptron_ovr(Xi_train_sc, yi_train, Xi_val_sc, yi_val, lr=0.04, epochs=100)

plot_curves(hist_i['train_loss'], hist_i['val_loss'], 'Perceptrón OvR (Iris) - Loss', 'Loss = 1 - accuracy')
plot_curves(hist_i['train_acc'], hist_i['val_acc'], 'Perceptrón OvR (Iris) - Accuracy', 'Accuracy')

[15]

# Evaluación multiclase
yi_pred = np.argmax(Xi_test_sc @ W_i.T + b_i, axis=1)

metrics_i = {
    'Accuracy': accuracy_score(yi_test, yi_pred),
    'Precision_macro': precision_score(yi_test, yi_pred, average='macro'),
    'Recall_macro': recall_score(yi_test, yi_pred, average='macro'),
    'F1_macro': f1_score(yi_test, yi_pred, average='macro')
}
print('Métricas test (Iris OvR):', metrics_i)

fig, ax = plt.subplots(figsize=(4.8, 4.2))
ConfusionMatrixDisplay(confusion_matrix(yi_test, yi_pred), display_labels=iris.target_names).plot(
    ax=ax, cmap='Greens', colorbar=False
)
plt.title('Confusion Matrix - Iris OvR')
plt.tight_layout()
plt.show()

Métricas test (Iris OvR): {'Accuracy': 0.9, 'Precision_macro': 0.9023569023569024, 'Recall_macro': 0.9, 'F1_macro': 0.899749373433584}

5) Experimento D — Perceptrón con salida lineal para regresión

Con activación lineal, una sola neurona se comporta como una regresión lineal entrenada por gradiente.

Aquí veremos curvas de MSE train/val y comparación contra LinearRegression de scikit-learn.

[16]

# Generamos dataset de regresión sintético
X_reg, y_reg = make_regression(
    n_samples=1200,
    n_features=1,
    n_informative=1,
    noise=18,
    random_state=SEED
)

df_reg = pd.DataFrame({'x': X_reg.ravel(), 'y': y_reg})
print('Shape:', df_reg.shape)
display(df_reg.head())

plt.figure(figsize=(7, 4.5))
sns.scatterplot(data=df_reg.sample(500, random_state=SEED), x='x', y='y', alpha=0.6)
plt.title('Regresión sintética: relación aproximadamente lineal con ruido')
plt.tight_layout()
plt.show()

Shape: (1200, 2)

	x	y
0	-0.254977	-5.890773
1	0.447709	17.217733
2	0.711615	34.946824
3	0.153725	2.288131
4	-0.013497	5.600308

[17]

# Split y escalado
Xr_train_full, Xr_test, yr_train_full, yr_test = train_test_split(
    X_reg, y_reg, test_size=0.15, random_state=SEED
)
Xr_train, Xr_val, yr_train, yr_val = train_test_split(
    Xr_train_full, yr_train_full, test_size=0.1765, random_state=SEED
)

scaler_r = StandardScaler()
Xr_train_sc = scaler_r.fit_transform(Xr_train)
Xr_val_sc = scaler_r.transform(Xr_val)
Xr_test_sc = scaler_r.transform(Xr_test)

[18]

def train_linear_neuron(X_train, y_train, X_val, y_val, lr=0.03, epochs=120):
    """Entrena neurona lineal por SGD minimizando MSE."""
    w = 0.0
    b = 0.0
    history = {'train_loss': [], 'val_loss': []}

    for epoch in range(epochs):
        for xi, yi in zip(X_train.ravel(), y_train):
            y_hat = w * xi + b
            error = y_hat - yi

            # Gradientes de MSE para una muestra
            grad_w = 2 * error * xi
            grad_b = 2 * error

            w -= lr * grad_w
            b -= lr * grad_b

        # MSE por época
        train_pred = w * X_train.ravel() + b
        val_pred = w * X_val.ravel() + b

        train_mse = mean_squared_error(y_train, train_pred)
        val_mse = mean_squared_error(y_val, val_pred)

        history['train_loss'].append(train_mse)
        history['val_loss'].append(val_mse)

    return w, b, history

w_r, b_r, hist_r = train_linear_neuron(Xr_train_sc, yr_train, Xr_val_sc, yr_val, lr=0.03, epochs=120)

plot_curves(hist_r['train_loss'], hist_r['val_loss'], 'Neurona lineal (regresión) - MSE', 'MSE')

[19]

# Evaluación en test para neurona lineal
yr_pred_neuron = w_r * Xr_test_sc.ravel() + b_r

metrics_r_neuron = {
    'MSE': mean_squared_error(yr_test, yr_pred_neuron),
    'RMSE': np.sqrt(mean_squared_error(yr_test, yr_pred_neuron)),
    'MAE': mean_absolute_error(yr_test, yr_pred_neuron),
    'R2': r2_score(yr_test, yr_pred_neuron)
}
print('Métricas test (neurona lineal):', metrics_r_neuron)

# Comparación con regresión lineal analítica
lin = LinearRegression()
lin.fit(Xr_train_sc, yr_train)
yr_pred_lin = lin.predict(Xr_test_sc)

metrics_r_lin = {
    'MSE': mean_squared_error(yr_test, yr_pred_lin),
    'RMSE': np.sqrt(mean_squared_error(yr_test, yr_pred_lin)),
    'MAE': mean_absolute_error(yr_test, yr_pred_lin),
    'R2': r2_score(yr_test, yr_pred_lin)
}
print('Métricas test (LinearRegression):', metrics_r_lin)

Métricas test (neurona lineal): {'MSE': 355.46976808788344, 'RMSE': np.float64(18.853905910656376), 'MAE': 14.775302856780987, 'R2': 0.007408699854160816}
Métricas test (LinearRegression): {'MSE': 348.89259804441065, 'RMSE': np.float64(18.678666923643416), 'MAE': 14.477689303224912, 'R2': 0.02577437353675882}

[20]

# Visualización final de ajuste en test
x_plot = Xr_test_sc.ravel()
order = np.argsort(x_plot)

plt.figure(figsize=(7, 4.5))
plt.scatter(x_plot, yr_test, s=20, alpha=0.5, label='Datos test')
plt.plot(x_plot[order], yr_pred_neuron[order], color='crimson', lw=2, label='Neurona lineal (SGD)')
plt.plot(x_plot[order], yr_pred_lin[order], color='black', lw=2, ls='--', label='LinearRegression')
plt.title('Comparación: neurona lineal vs regresión lineal')
plt.xlabel('x (escalado)')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()

6) Comparación resumida de experimentos

Reunimos métricas clave para comparar de forma rápida los distintos casos de uso del perceptrón.

[21]

summary_cls = pd.DataFrame([
    {'Caso': 'Clasificación separable', **metrics_sep},
    {'Caso': 'Clasificación no lineal (moons)', **metrics_m},
    {'Caso': 'Clasificación multiclase (Iris OvR)', **metrics_i},
])

summary_reg = pd.DataFrame([
    {'Modelo': 'Neurona lineal (SGD)', **metrics_r_neuron},
    {'Modelo': 'LinearRegression', **metrics_r_lin},
])

print('=== Resumen clasificación ===')
display(summary_cls)
print('=== Resumen regresión ===')
display(summary_reg)

=== Resumen clasificación ===

	Caso	Accuracy	Precision	Recall	F1	Precision_macro	Recall_macro	F1_macro
0	Clasificación separable	1.000000	1.000000	1.000000	1.000000	NaN	NaN	NaN
1	Clasificación no lineal (moons)	0.829167	0.826446	0.833333	0.829876	NaN	NaN	NaN
2	Clasificación multiclase (Iris OvR)	0.900000	NaN	NaN	NaN	0.902357	0.9	0.899749

=== Resumen regresión ===

	Modelo	MSE	RMSE	MAE	R2
0	Neurona lineal (SGD)	355.469768	18.853906	14.775303	0.007409
1	LinearRegression	348.892598	18.678667	14.477689	0.025774

7) Tests rápidos (sanity checks)

Comprobaciones sencillas para validar que los experimentos produjeron resultados coherentes.

[22]

# Clasificación: métricas en rango [0,1]
for col in ['Accuracy', 'Precision', 'Recall', 'F1']:
    assert 0 <= metrics_sep[col] <= 1, f'{col} fuera de rango en caso separable'
    assert 0 <= metrics_m[col] <= 1, f'{col} fuera de rango en moons'

for col in ['Accuracy', 'Precision_macro', 'Recall_macro', 'F1_macro']:
    assert 0 <= metrics_i[col] <= 1, f'{col} fuera de rango en Iris'

# Regresión: errores finitos y R2 finito
for d in [metrics_r_neuron, metrics_r_lin]:
    assert np.isfinite(d['MSE']) and np.isfinite(d['RMSE']) and np.isfinite(d['MAE']) and np.isfinite(d['R2'])

print('✅ Sanity checks completados correctamente')

✅ Sanity checks completados correctamente

Conclusiones y siguientes pasos

Conclusiones

El perceptrón funciona muy bien en problemas linealmente separables.
En fronteras no lineales (como moons), su rendimiento se estanca porque solo aprende una frontera lineal.
La extensión multiclase OvR es útil y didáctica para problemas simples como Iris.
Con salida lineal, una neurona conecta directamente con regresión lineal, reforzando la intuición de que las redes profundas extienden estos bloques básicos.

Qué podrías probar después

Añadir features polinómicas antes del perceptrón para mejorar casos no lineales.
Comparar con MLPClassifier y observar la mejora en moons.
Probar diferentes tasas de aprendizaje y analizar estabilidad.
Implementar regularización y estudiar sesgo-varianza.
Repetir experimentos con datasets reales de mayor dimensión.

Idea final: entender muy bien el perceptrón te da una base sólida para comprender MLPs, backpropagation y deep learning moderno.