🏭 Caso de Uso

Comparativa MLP: NumPy vs TensorFlow vs PyTorch

Implementación del mismo MLP de 3 capas ocultas en NumPy, Keras y PyTorch sobre MNIST: comparativa de accuracy, tiempos y experiencia de desarrollo.

🐍 Python 📓 Jupyter Notebook

Comparativa completa de un MLP de 3 capas ocultas en MNIST: NumPy vs TensorFlow (Keras) vs PyTorch

En este notebook construiremos y compararemos el mismo tipo de red MLP (3 capas ocultas) implementada en tres ecosistemas distintos:

  1. NumPy (implementación manual, desde cero).
  2. TensorFlow/Keras (API de alto nivel sobre TensorFlow).
  3. PyTorch (modelo y loop de entrenamiento explícitos).

Objetivo didáctico

La idea no es solo entrenar tres modelos, sino entender:

  • Qué partes del pipeline son iguales en todos los frameworks.
  • Qué partes automatiza cada herramienta.
  • Cómo cambian el código, el rendimiento y la experiencia de desarrollo.

Además, compararemos:

  • métricas de clasificación (accuracy, F1 macro, etc.),
  • curvas de aprendizaje (loss/accuracy en train/val),
  • tiempos de entrenamiento,
  • tiempos de inferencia.

Fundamentos matemáticos/computacionales (resumen)

Un MLP aplica composiciones afines + no linealidades:

$$ \mathbf{h}^{(1)} = \phi(\mathbf{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)}), \quad \mathbf{h}^{(2)} = \phi(\mathbf{W}^{(2)}\mathbf{h}^{(1)} + \mathbf{b}^{(2)}), \quad \mathbf{h}^{(3)} = \phi(\mathbf{W}^{(3)}\mathbf{h}^{(2)} + \mathbf{b}^{(3)}) $$

La capa de salida para clasificación multiclase usa logits:

$$ \mathbf{z} = \mathbf{W}^{(4)}\mathbf{h}^{(3)} + \mathbf{b}^{(4)} $$

y probabilidades con softmax:

$$ \hat{p}_k = \frac{e^{z_k}}{\sum_j e^{z_j}} $$

Optimizamos con cross-entropy:

$$ \mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \sum_{k=1}^{K} y_{ik}\log(\hat{p}_{ik}) $$

mediante descenso de gradiente (o variantes como Adam).


Dataset y arquitectura usada

  • Dataset: MNIST (dígitos manuscritos, 10 clases, imágenes 28x28).
  • Entrada: vector de 784 características (imagen aplanada).
  • Arquitectura común en los 3 casos:
    • Capa oculta 1: 256 neuronas + ReLU
    • Capa oculta 2: 128 neuronas + ReLU
    • Capa oculta 3: 64 neuronas + ReLU
    • Salida: 10 neuronas (logits)

En todo el notebook intentamos mantener hiperparámetros similares para que la comparación sea justa.

[1]
# Librerías generales y configuración
import os
# Forzar ejecución en CPU (evita errores del autotuner XLA/Triton en GPU)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

import time
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, ConfusionMatrixDisplay
)
from sklearn.model_selection import train_test_split

sns.set_theme(style='whitegrid', context='notebook')
plt.rcParams['figure.figsize'] = (8, 5)

SEED = 42
np.random.seed(SEED)

1) Carga de MNIST y EDA básico

Empezamos con una exploración rápida para familiarizarnos con el problema.

[2]
# Carga con Keras datasets (lo usaremos como fuente común para los 3 enfoques)
from tensorflow.keras.datasets import mnist

(X_train_full_img, y_train_full), (X_test_img, y_test) = mnist.load_data()

print('Train full images:', X_train_full_img.shape)
print('Test images:', X_test_img.shape)
print('Rango de píxeles:', X_train_full_img.min(), 'a', X_train_full_img.max())
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773738657.673513 3172795 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1773738657.700386 3172795 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773738658.293016 3172795 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Train full images: (60000, 28, 28)
Test images: (10000, 28, 28)
Rango de píxeles: 0 a 255
[3]
# Split train/val (mantenemos test aparte para comparación final)
X_train_img, X_val_img, y_train, y_val = train_test_split(
    X_train_full_img, y_train_full, test_size=0.1, random_state=SEED, stratify=y_train_full
)

print('Train:', X_train_img.shape, 'Val:', X_val_img.shape, 'Test:', X_test_img.shape)
Train: (54000, 28, 28) Val: (6000, 28, 28) Test: (10000, 28, 28)
[4]
# Visualización de ejemplos por clase
fig, axes = plt.subplots(2, 5, figsize=(11, 5))
classes_shown = list(range(10))

for ax, cls in zip(axes.ravel(), classes_shown):
    idx = np.where(y_train == cls)[0][0]
    ax.imshow(X_train_img[idx], cmap='gray')
    ax.set_title(f'Clase {cls}')
    ax.axis('off')

plt.suptitle('Ejemplos de MNIST')
plt.tight_layout()
plt.show()
Output
[5]
# Distribución de clases
counts = pd.Series(y_train).value_counts().sort_index()

plt.figure(figsize=(8, 4))
sns.barplot(x=counts.index, y=counts.values, palette='viridis')
plt.title('Distribución de clases en train')
plt.xlabel('Dígito')
plt.ylabel('Número de muestras')
plt.tight_layout()
plt.show()
Output
[6]
# Preprocesado común: aplanar y normalizar a [0,1]
def preprocess_flat(x):
    return x.reshape(len(x), -1).astype(np.float32) / 255.0

X_train = preprocess_flat(X_train_img)
X_val = preprocess_flat(X_val_img)
X_test = preprocess_flat(X_test_img)

num_classes = 10

print('X_train:', X_train.shape, 'X_val:', X_val.shape, 'X_test:', X_test.shape)
X_train: (54000, 784) X_val: (6000, 784) X_test: (10000, 784)

2) Métricas y utilidades comunes

[7]
def classification_metrics(y_true, y_pred):
    return {
        'Accuracy': accuracy_score(y_true, y_pred),
        'Precision_macro': precision_score(y_true, y_pred, average='macro', zero_division=0),
        'Recall_macro': recall_score(y_true, y_pred, average='macro', zero_division=0),
        'F1_macro': f1_score(y_true, y_pred, average='macro', zero_division=0),
    }


def plot_curves(train_values, val_values, title, ylabel):
    epochs = np.arange(1, len(train_values) + 1)
    plt.figure(figsize=(8, 4.5))
    plt.plot(epochs, train_values, label='Train')
    plt.plot(epochs, val_values, label='Validación')
    plt.title(title)
    plt.xlabel('Época')
    plt.ylabel(ylabel)
    plt.legend()
    plt.tight_layout()
    plt.show()

3) MLP con NumPy (desde cero)

Aquí implementamos forward, backward y actualización manual para entender la mecánica interna.

[8]
# Utilidades matemáticas NumPy

def relu(x):
    return np.maximum(0, x)


def relu_grad(x):
    return (x > 0).astype(np.float32)


def softmax(logits):
    # Estabilidad numérica
    z = logits - np.max(logits, axis=1, keepdims=True)
    exp_z = np.exp(z)
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)


def one_hot(y, n_classes=10):
    out = np.zeros((len(y), n_classes), dtype=np.float32)
    out[np.arange(len(y)), y] = 1.0
    return out


def cross_entropy(y_true_oh, y_proba, eps=1e-9):
    return -np.mean(np.sum(y_true_oh * np.log(y_proba + eps), axis=1))
[9]
class MLPNumPy:
    def __init__(self, in_dim=784, h1=256, h2=128, h3=64, out_dim=10, seed=42):
        rng = np.random.default_rng(seed)

        # Inicialización He para ReLU
        self.W1 = rng.normal(0, np.sqrt(2/in_dim), size=(in_dim, h1)).astype(np.float32)
        self.b1 = np.zeros((1, h1), dtype=np.float32)

        self.W2 = rng.normal(0, np.sqrt(2/h1), size=(h1, h2)).astype(np.float32)
        self.b2 = np.zeros((1, h2), dtype=np.float32)

        self.W3 = rng.normal(0, np.sqrt(2/h2), size=(h2, h3)).astype(np.float32)
        self.b3 = np.zeros((1, h3), dtype=np.float32)

        self.W4 = rng.normal(0, np.sqrt(2/h3), size=(h3, out_dim)).astype(np.float32)
        self.b4 = np.zeros((1, out_dim), dtype=np.float32)

    def forward(self, X):
        # Guardamos intermedios para backward
        self.z1 = X @ self.W1 + self.b1
        self.a1 = relu(self.z1)

        self.z2 = self.a1 @ self.W2 + self.b2
        self.a2 = relu(self.z2)

        self.z3 = self.a2 @ self.W3 + self.b3
        self.a3 = relu(self.z3)

        self.z4 = self.a3 @ self.W4 + self.b4
        self.p = softmax(self.z4)
        return self.p

    def backward(self, X, y_oh):
        m = X.shape[0]

        dz4 = (self.p - y_oh) / m
        dW4 = self.a3.T @ dz4
        db4 = np.sum(dz4, axis=0, keepdims=True)

        da3 = dz4 @ self.W4.T
        dz3 = da3 * relu_grad(self.z3)
        dW3 = self.a2.T @ dz3
        db3 = np.sum(dz3, axis=0, keepdims=True)

        da2 = dz3 @ self.W3.T
        dz2 = da2 * relu_grad(self.z2)
        dW2 = self.a1.T @ dz2
        db2 = np.sum(dz2, axis=0, keepdims=True)

        da1 = dz2 @ self.W2.T
        dz1 = da1 * relu_grad(self.z1)
        dW1 = X.T @ dz1
        db1 = np.sum(dz1, axis=0, keepdims=True)

        grads = (dW1, db1, dW2, db2, dW3, db3, dW4, db4)
        return grads

    def step(self, grads, lr=1e-3):
        dW1, db1, dW2, db2, dW3, db3, dW4, db4 = grads
        self.W1 -= lr * dW1
        self.b1 -= lr * db1
        self.W2 -= lr * dW2
        self.b2 -= lr * db2
        self.W3 -= lr * dW3
        self.b3 -= lr * db3
        self.W4 -= lr * dW4
        self.b4 -= lr * db4
[10]
def train_numpy_mlp(model, X_train, y_train, X_val, y_val, epochs=10, batch_size=128, lr=1e-3):
    history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}

    y_train_oh = one_hot(y_train, num_classes)
    y_val_oh = one_hot(y_val, num_classes)

    n = X_train.shape[0]
    t0 = time.perf_counter()

    for epoch in range(epochs):
        # Mezcla al inicio de cada época
        idx = np.random.permutation(n)
        X_sh = X_train[idx]
        y_sh = y_train[idx]
        y_sh_oh = y_train_oh[idx]

        # Mini-batch SGD
        for start in range(0, n, batch_size):
            end = start + batch_size
            xb = X_sh[start:end]
            yb_oh = y_sh_oh[start:end]

            _ = model.forward(xb)
            grads = model.backward(xb, yb_oh)
            model.step(grads, lr=lr)

        # Métricas por época
        p_train = model.forward(X_train)
        p_val = model.forward(X_val)

        train_loss = cross_entropy(y_train_oh, p_train)
        val_loss = cross_entropy(y_val_oh, p_val)

        train_pred = np.argmax(p_train, axis=1)
        val_pred = np.argmax(p_val, axis=1)

        history['train_loss'].append(train_loss)
        history['val_loss'].append(val_loss)
        history['train_acc'].append(accuracy_score(y_train, train_pred))
        history['val_acc'].append(accuracy_score(y_val, val_pred))

    train_time = time.perf_counter() - t0
    return model, history, train_time


numpy_model = MLPNumPy(seed=SEED)
numpy_model, numpy_hist, numpy_train_time = train_numpy_mlp(
    numpy_model, X_train, y_train, X_val, y_val, epochs=10, batch_size=128, lr=1e-3
)

plot_curves(numpy_hist['train_loss'], numpy_hist['val_loss'], 'NumPy MLP - Loss', 'Cross-Entropy')
plot_curves(numpy_hist['train_acc'], numpy_hist['val_acc'], 'NumPy MLP - Accuracy', 'Accuracy')
Output
Output
[11]
# Evaluación NumPy en test + tiempo de inferencia
inf_t0 = time.perf_counter()
numpy_test_proba = numpy_model.forward(X_test)
numpy_test_pred = np.argmax(numpy_test_proba, axis=1)
numpy_infer_time = time.perf_counter() - inf_t0

numpy_metrics = classification_metrics(y_test, numpy_test_pred)
print('Métricas test NumPy:', numpy_metrics)
print(f'Tiempo entrenamiento NumPy: {numpy_train_time:.3f} s')
print(f'Tiempo inferencia NumPy (test completo): {numpy_infer_time:.4f} s')

fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, numpy_test_pred)).plot(ax=ax, cmap='Blues', colorbar=False)
plt.title('NumPy - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test NumPy: {'Accuracy': 0.8762, 'Precision_macro': 0.8744494706374877, 'Recall_macro': 0.8737371221120137, 'F1_macro': 0.873370849122888}
Tiempo entrenamiento NumPy: 4.242 s
Tiempo inferencia NumPy (test completo): 0.0167 s
Output

4) MLP con TensorFlow (Keras)

Misma arquitectura, pero aprovechando API de alto nivel y optimizador Adam.

[12]
import tensorflow as tf
from tensorflow import keras

# Reproducibilidad básica
tf.random.set_seed(SEED)

keras_model = keras.Sequential([
    keras.layers.Input(shape=(784,)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10)  # logits
])

keras_model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

t0 = time.perf_counter()
keras_hist_obj = keras_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=10,
    batch_size=128,
    verbose=0
)
keras_train_time = time.perf_counter() - t0

keras_hist = {
    'train_loss': keras_hist_obj.history['loss'],
    'val_loss': keras_hist_obj.history['val_loss'],
    'train_acc': keras_hist_obj.history['accuracy'],
    'val_acc': keras_hist_obj.history['val_accuracy'],
}

plot_curves(keras_hist['train_loss'], keras_hist['val_loss'], 'Keras MLP - Loss', 'Cross-Entropy')
plot_curves(keras_hist['train_acc'], keras_hist['val_acc'], 'Keras MLP - Accuracy', 'Accuracy')
E0000 00:00:1773738663.564717 3172795 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
I0000 00:00:1773738663.564737 3172795 cuda_diagnostics.cc:160] env: CUDA_VISIBLE_DEVICES="-1"
I0000 00:00:1773738663.564744 3172795 cuda_diagnostics.cc:163] CUDA_VISIBLE_DEVICES is set to -1 - this hides all GPUs from CUDA
I0000 00:00:1773738663.564754 3172795 cuda_diagnostics.cc:171] verbose logging is disabled. Rerun with verbose logging (usually --v=1 or --vmodule=cuda_diagnostics=1) to get more diagnostic output from this module
I0000 00:00:1773738663.564755 3172795 cuda_diagnostics.cc:176] retrieving CUDA diagnostic information for host: tnp01-4090
I0000 00:00:1773738663.564757 3172795 cuda_diagnostics.cc:183] hostname: tnp01-4090
I0000 00:00:1773738663.564870 3172795 cuda_diagnostics.cc:190] libcuda reported version is: 580.126.9
I0000 00:00:1773738663.564877 3172795 cuda_diagnostics.cc:194] kernel reported version is: 580.126.9
I0000 00:00:1773738663.564878 3172795 cuda_diagnostics.cc:284] kernel version seems to match DSO: 580.126.9
Output
Output
[13]
# Evaluación Keras + inferencia
inf_t0 = time.perf_counter()
keras_logits = keras_model.predict(X_test, verbose=0)
keras_test_pred = np.argmax(keras_logits, axis=1)
keras_infer_time = time.perf_counter() - inf_t0

keras_metrics = classification_metrics(y_test, keras_test_pred)
print('Métricas test Keras:', keras_metrics)
print(f'Tiempo entrenamiento Keras: {keras_train_time:.3f} s')
print(f'Tiempo inferencia Keras (test completo): {keras_infer_time:.4f} s')

fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, keras_test_pred)).plot(ax=ax, cmap='Greens', colorbar=False)
plt.title('Keras - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test Keras: {'Accuracy': 0.9773, 'Precision_macro': 0.977310036503291, 'Recall_macro': 0.9770520022309702, 'F1_macro': 0.9771264780798304}
Tiempo entrenamiento Keras: 7.964 s
Tiempo inferencia Keras (test completo): 0.1854 s
Output

5) MLP con PyTorch

En PyTorch explicitamos arquitectura, DataLoader y bucle de entrenamiento.

[14]
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

torch.manual_seed(SEED)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Dispositivo:', device)

X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)
X_val_t = torch.tensor(X_val, dtype=torch.float32)
y_val_t = torch.tensor(y_val, dtype=torch.long)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.long)

train_loader = DataLoader(TensorDataset(X_train_t, y_train_t), batch_size=128, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val_t, y_val_t), batch_size=256, shuffle=False)
Dispositivo: cpu
[15]
class MLPTorch(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 256), nn.ReLU(),
            nn.Linear(256, 128), nn.ReLU(),
            nn.Linear(128, 64), nn.ReLU(),
            nn.Linear(64, 10)
        )

    def forward(self, x):
        return self.net(x)


def train_torch(model, train_loader, val_loader, epochs=10, lr=1e-3):
    model = model.to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    hist = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}

    t0 = time.perf_counter()

    for epoch in range(epochs):
        # Train
        model.train()
        train_losses = []
        train_correct, train_total = 0, 0

        for xb, yb in train_loader:
            xb, yb = xb.to(device), yb.to(device)

            optimizer.zero_grad()
            logits = model(xb)
            loss = criterion(logits, yb)
            loss.backward()
            optimizer.step()

            train_losses.append(loss.item())
            pred = torch.argmax(logits, dim=1)
            train_correct += (pred == yb).sum().item()
            train_total += yb.size(0)

        # Val
        model.eval()
        val_losses = []
        val_correct, val_total = 0, 0

        with torch.no_grad():
            for xb, yb in val_loader:
                xb, yb = xb.to(device), yb.to(device)
                logits = model(xb)
                loss = criterion(logits, yb)
                val_losses.append(loss.item())
                pred = torch.argmax(logits, dim=1)
                val_correct += (pred == yb).sum().item()
                val_total += yb.size(0)

        hist['train_loss'].append(float(np.mean(train_losses)))
        hist['val_loss'].append(float(np.mean(val_losses)))
        hist['train_acc'].append(train_correct / train_total)
        hist['val_acc'].append(val_correct / val_total)

    train_time = time.perf_counter() - t0
    return model, hist, train_time


torch_model = MLPTorch()
torch_model, torch_hist, torch_train_time = train_torch(torch_model, train_loader, val_loader, epochs=10, lr=1e-3)

plot_curves(torch_hist['train_loss'], torch_hist['val_loss'], 'PyTorch MLP - Loss', 'Cross-Entropy')
plot_curves(torch_hist['train_acc'], torch_hist['val_acc'], 'PyTorch MLP - Accuracy', 'Accuracy')
Output
Output
[16]
# Evaluación PyTorch + inferencia
torch_model.eval()
inf_t0 = time.perf_counter()
with torch.no_grad():
    logits_test = torch_model(X_test_t.to(device)).cpu().numpy()

torch_test_pred = np.argmax(logits_test, axis=1)
torch_infer_time = time.perf_counter() - inf_t0

torch_metrics = classification_metrics(y_test, torch_test_pred)
print('Métricas test PyTorch:', torch_metrics)
print(f'Tiempo entrenamiento PyTorch: {torch_train_time:.3f} s')
print(f'Tiempo inferencia PyTorch (test completo): {torch_infer_time:.4f} s')

fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, torch_test_pred)).plot(ax=ax, cmap='Oranges', colorbar=False)
plt.title('PyTorch - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test PyTorch: {'Accuracy': 0.9766, 'Precision_macro': 0.9765387654782114, 'Recall_macro': 0.9762661942254758, 'F1_macro': 0.9763238897317983}
Tiempo entrenamiento PyTorch: 6.118 s
Tiempo inferencia PyTorch (test completo): 0.0066 s
Output

6) Comparación final de resultados y tiempos

[17]
results = pd.DataFrame([
    {'Framework': 'NumPy', **numpy_metrics, 'Train_time_s': numpy_train_time, 'Inference_time_s': numpy_infer_time},
    {'Framework': 'TensorFlow/Keras', **keras_metrics, 'Train_time_s': keras_train_time, 'Inference_time_s': keras_infer_time},
    {'Framework': 'PyTorch', **torch_metrics, 'Train_time_s': torch_train_time, 'Inference_time_s': torch_infer_time},
])

results
Framework Accuracy Precision_macro Recall_macro F1_macro Train_time_s Inference_time_s
0 NumPy 0.8762 0.874449 0.873737 0.873371 4.241594 0.016665
1 TensorFlow/Keras 0.9773 0.977310 0.977052 0.977126 7.964333 0.185445
2 PyTorch 0.9766 0.976539 0.976266 0.976324 6.117832 0.006642
[18]
fig, axes = plt.subplots(1, 3, figsize=(17, 4.5))

sns.barplot(data=results, x='Accuracy', y='Framework', ax=axes[0], palette='Blues')
axes[0].set_title('Accuracy test (mayor es mejor)')

sns.barplot(data=results, x='F1_macro', y='Framework', ax=axes[1], palette='Greens')
axes[1].set_title('F1 macro (mayor es mejor)')

sns.barplot(data=results, x='Train_time_s', y='Framework', ax=axes[2], palette='Reds_r')
axes[2].set_title('Tiempo de entrenamiento (s, menor es mejor)')

plt.tight_layout()
plt.show()
Output
[19]
fig, ax = plt.subplots(figsize=(7, 4))
sns.barplot(data=results, x='Framework', y='Inference_time_s', palette='Purples', ax=ax)
ax.set_title('Tiempo de inferencia (test completo)')
ax.set_ylabel('Segundos')
ax.set_xlabel('')
plt.tight_layout()
plt.show()
Output

7) Tests rápidos (sanity checks)

Verificaciones sencillas para asegurar que la comparación tiene resultados válidos.

[20]
# Métricas en rango y tiempos positivos
for col in ['Accuracy', 'Precision_macro', 'Recall_macro', 'F1_macro']:
    assert np.isfinite(results[col]).all(), f'{col} contiene no finitos'
    assert ((results[col] >= 0) & (results[col] <= 1)).all(), f'{col} fuera de [0,1]'

assert (results['Train_time_s'] > 0).all(), 'Train_time_s debe ser positivo'
assert (results['Inference_time_s'] > 0).all(), 'Inference_time_s debe ser positivo'

# Coherencia mínima de curvas
assert len(numpy_hist['train_loss']) == 10
assert len(keras_hist['train_loss']) == 10
assert len(torch_hist['train_loss']) == 10

print('✅ Sanity checks completados correctamente')
✅ Sanity checks completados correctamente

Conclusiones y siguientes pasos

Conclusiones clave

  1. El mismo MLP de 3 capas ocultas puede implementarse en todos los frameworks, pero con distinto nivel de abstracción.
  2. NumPy aporta comprensión profunda de forward/backward, aunque requiere más código y cuidado numérico.
  3. Keras simplifica muchísimo el entrenamiento y es ideal para prototipado rápido.
  4. PyTorch ofrece equilibrio entre control y productividad, especialmente útil en investigación.
  5. La comparación de tiempos y métricas debe hacerse en el mismo entorno/hardware, porque CPU/GPU y librerías influyen mucho.

Ideas para ampliar

  • Probar regularización (Dropout, L2) y comparar sobreajuste.
  • Añadir BatchNorm y estudiar estabilidad del entrenamiento.
  • Ejecutar con GPU y medir diferencia de tiempo.
  • Probar Fashion-MNIST o EMNIST para evaluar generalización.
  • Repetir experimento variando tamaño del MLP y learning rate.

Idea final: dominar la misma arquitectura en distintos frameworks te ayuda a separar lo esencial del modelo de los detalles de implementación.