Comparativa MLP: NumPy vs TensorFlow vs PyTorch
Implementación del mismo MLP de 3 capas ocultas en NumPy, Keras y PyTorch sobre MNIST: comparativa de accuracy, tiempos y experiencia de desarrollo.
Comparativa completa de un MLP de 3 capas ocultas en MNIST: NumPy vs TensorFlow (Keras) vs PyTorch
En este notebook construiremos y compararemos el mismo tipo de red MLP (3 capas ocultas) implementada en tres ecosistemas distintos:
- NumPy (implementación manual, desde cero).
- TensorFlow/Keras (API de alto nivel sobre TensorFlow).
- PyTorch (modelo y loop de entrenamiento explícitos).
Objetivo didáctico
La idea no es solo entrenar tres modelos, sino entender:
- Qué partes del pipeline son iguales en todos los frameworks.
- Qué partes automatiza cada herramienta.
- Cómo cambian el código, el rendimiento y la experiencia de desarrollo.
Además, compararemos:
- métricas de clasificación (accuracy, F1 macro, etc.),
- curvas de aprendizaje (loss/accuracy en train/val),
- tiempos de entrenamiento,
- tiempos de inferencia.
Fundamentos matemáticos/computacionales (resumen)
Un MLP aplica composiciones afines + no linealidades:
$$ \mathbf{h}^{(1)} = \phi(\mathbf{W}^{(1)}\mathbf{x} + \mathbf{b}^{(1)}), \quad \mathbf{h}^{(2)} = \phi(\mathbf{W}^{(2)}\mathbf{h}^{(1)} + \mathbf{b}^{(2)}), \quad \mathbf{h}^{(3)} = \phi(\mathbf{W}^{(3)}\mathbf{h}^{(2)} + \mathbf{b}^{(3)}) $$
La capa de salida para clasificación multiclase usa logits:
$$ \mathbf{z} = \mathbf{W}^{(4)}\mathbf{h}^{(3)} + \mathbf{b}^{(4)} $$
y probabilidades con softmax:
$$ \hat{p}_k = \frac{e^{z_k}}{\sum_j e^{z_j}} $$
Optimizamos con cross-entropy:
$$ \mathcal{L} = -\frac{1}{N}\sum_{i=1}^{N} \sum_{k=1}^{K} y_{ik}\log(\hat{p}_{ik}) $$
mediante descenso de gradiente (o variantes como Adam).
Dataset y arquitectura usada
- Dataset: MNIST (dígitos manuscritos, 10 clases, imágenes 28x28).
- Entrada: vector de 784 características (imagen aplanada).
- Arquitectura común en los 3 casos:
- Capa oculta 1: 256 neuronas + ReLU
- Capa oculta 2: 128 neuronas + ReLU
- Capa oculta 3: 64 neuronas + ReLU
- Salida: 10 neuronas (logits)
En todo el notebook intentamos mantener hiperparámetros similares para que la comparación sea justa.
# Librerías generales y configuración
import os
# Forzar ejecución en CPU (evita errores del autotuner XLA/Triton en GPU)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import time
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (
accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix, ConfusionMatrixDisplay
)
from sklearn.model_selection import train_test_split
sns.set_theme(style='whitegrid', context='notebook')
plt.rcParams['figure.figsize'] = (8, 5)
SEED = 42
np.random.seed(SEED)
1) Carga de MNIST y EDA básico
Empezamos con una exploración rápida para familiarizarnos con el problema.
# Carga con Keras datasets (lo usaremos como fuente común para los 3 enfoques)
from tensorflow.keras.datasets import mnist
(X_train_full_img, y_train_full), (X_test_img, y_test) = mnist.load_data()
print('Train full images:', X_train_full_img.shape)
print('Test images:', X_test_img.shape)
print('Rango de píxeles:', X_train_full_img.min(), 'a', X_train_full_img.max())
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1773738657.673513 3172795 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. I0000 00:00:1773738657.700386 3172795 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1773738658.293016 3172795 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
Train full images: (60000, 28, 28) Test images: (10000, 28, 28) Rango de píxeles: 0 a 255
# Split train/val (mantenemos test aparte para comparación final)
X_train_img, X_val_img, y_train, y_val = train_test_split(
X_train_full_img, y_train_full, test_size=0.1, random_state=SEED, stratify=y_train_full
)
print('Train:', X_train_img.shape, 'Val:', X_val_img.shape, 'Test:', X_test_img.shape)
Train: (54000, 28, 28) Val: (6000, 28, 28) Test: (10000, 28, 28)
# Visualización de ejemplos por clase
fig, axes = plt.subplots(2, 5, figsize=(11, 5))
classes_shown = list(range(10))
for ax, cls in zip(axes.ravel(), classes_shown):
idx = np.where(y_train == cls)[0][0]
ax.imshow(X_train_img[idx], cmap='gray')
ax.set_title(f'Clase {cls}')
ax.axis('off')
plt.suptitle('Ejemplos de MNIST')
plt.tight_layout()
plt.show()
# Distribución de clases
counts = pd.Series(y_train).value_counts().sort_index()
plt.figure(figsize=(8, 4))
sns.barplot(x=counts.index, y=counts.values, palette='viridis')
plt.title('Distribución de clases en train')
plt.xlabel('Dígito')
plt.ylabel('Número de muestras')
plt.tight_layout()
plt.show()
# Preprocesado común: aplanar y normalizar a [0,1]
def preprocess_flat(x):
return x.reshape(len(x), -1).astype(np.float32) / 255.0
X_train = preprocess_flat(X_train_img)
X_val = preprocess_flat(X_val_img)
X_test = preprocess_flat(X_test_img)
num_classes = 10
print('X_train:', X_train.shape, 'X_val:', X_val.shape, 'X_test:', X_test.shape)
X_train: (54000, 784) X_val: (6000, 784) X_test: (10000, 784)
2) Métricas y utilidades comunes
def classification_metrics(y_true, y_pred):
return {
'Accuracy': accuracy_score(y_true, y_pred),
'Precision_macro': precision_score(y_true, y_pred, average='macro', zero_division=0),
'Recall_macro': recall_score(y_true, y_pred, average='macro', zero_division=0),
'F1_macro': f1_score(y_true, y_pred, average='macro', zero_division=0),
}
def plot_curves(train_values, val_values, title, ylabel):
epochs = np.arange(1, len(train_values) + 1)
plt.figure(figsize=(8, 4.5))
plt.plot(epochs, train_values, label='Train')
plt.plot(epochs, val_values, label='Validación')
plt.title(title)
plt.xlabel('Época')
plt.ylabel(ylabel)
plt.legend()
plt.tight_layout()
plt.show()
3) MLP con NumPy (desde cero)
Aquí implementamos forward, backward y actualización manual para entender la mecánica interna.
# Utilidades matemáticas NumPy
def relu(x):
return np.maximum(0, x)
def relu_grad(x):
return (x > 0).astype(np.float32)
def softmax(logits):
# Estabilidad numérica
z = logits - np.max(logits, axis=1, keepdims=True)
exp_z = np.exp(z)
return exp_z / np.sum(exp_z, axis=1, keepdims=True)
def one_hot(y, n_classes=10):
out = np.zeros((len(y), n_classes), dtype=np.float32)
out[np.arange(len(y)), y] = 1.0
return out
def cross_entropy(y_true_oh, y_proba, eps=1e-9):
return -np.mean(np.sum(y_true_oh * np.log(y_proba + eps), axis=1))
class MLPNumPy:
def __init__(self, in_dim=784, h1=256, h2=128, h3=64, out_dim=10, seed=42):
rng = np.random.default_rng(seed)
# Inicialización He para ReLU
self.W1 = rng.normal(0, np.sqrt(2/in_dim), size=(in_dim, h1)).astype(np.float32)
self.b1 = np.zeros((1, h1), dtype=np.float32)
self.W2 = rng.normal(0, np.sqrt(2/h1), size=(h1, h2)).astype(np.float32)
self.b2 = np.zeros((1, h2), dtype=np.float32)
self.W3 = rng.normal(0, np.sqrt(2/h2), size=(h2, h3)).astype(np.float32)
self.b3 = np.zeros((1, h3), dtype=np.float32)
self.W4 = rng.normal(0, np.sqrt(2/h3), size=(h3, out_dim)).astype(np.float32)
self.b4 = np.zeros((1, out_dim), dtype=np.float32)
def forward(self, X):
# Guardamos intermedios para backward
self.z1 = X @ self.W1 + self.b1
self.a1 = relu(self.z1)
self.z2 = self.a1 @ self.W2 + self.b2
self.a2 = relu(self.z2)
self.z3 = self.a2 @ self.W3 + self.b3
self.a3 = relu(self.z3)
self.z4 = self.a3 @ self.W4 + self.b4
self.p = softmax(self.z4)
return self.p
def backward(self, X, y_oh):
m = X.shape[0]
dz4 = (self.p - y_oh) / m
dW4 = self.a3.T @ dz4
db4 = np.sum(dz4, axis=0, keepdims=True)
da3 = dz4 @ self.W4.T
dz3 = da3 * relu_grad(self.z3)
dW3 = self.a2.T @ dz3
db3 = np.sum(dz3, axis=0, keepdims=True)
da2 = dz3 @ self.W3.T
dz2 = da2 * relu_grad(self.z2)
dW2 = self.a1.T @ dz2
db2 = np.sum(dz2, axis=0, keepdims=True)
da1 = dz2 @ self.W2.T
dz1 = da1 * relu_grad(self.z1)
dW1 = X.T @ dz1
db1 = np.sum(dz1, axis=0, keepdims=True)
grads = (dW1, db1, dW2, db2, dW3, db3, dW4, db4)
return grads
def step(self, grads, lr=1e-3):
dW1, db1, dW2, db2, dW3, db3, dW4, db4 = grads
self.W1 -= lr * dW1
self.b1 -= lr * db1
self.W2 -= lr * dW2
self.b2 -= lr * db2
self.W3 -= lr * dW3
self.b3 -= lr * db3
self.W4 -= lr * dW4
self.b4 -= lr * db4
def train_numpy_mlp(model, X_train, y_train, X_val, y_val, epochs=10, batch_size=128, lr=1e-3):
history = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
y_train_oh = one_hot(y_train, num_classes)
y_val_oh = one_hot(y_val, num_classes)
n = X_train.shape[0]
t0 = time.perf_counter()
for epoch in range(epochs):
# Mezcla al inicio de cada época
idx = np.random.permutation(n)
X_sh = X_train[idx]
y_sh = y_train[idx]
y_sh_oh = y_train_oh[idx]
# Mini-batch SGD
for start in range(0, n, batch_size):
end = start + batch_size
xb = X_sh[start:end]
yb_oh = y_sh_oh[start:end]
_ = model.forward(xb)
grads = model.backward(xb, yb_oh)
model.step(grads, lr=lr)
# Métricas por época
p_train = model.forward(X_train)
p_val = model.forward(X_val)
train_loss = cross_entropy(y_train_oh, p_train)
val_loss = cross_entropy(y_val_oh, p_val)
train_pred = np.argmax(p_train, axis=1)
val_pred = np.argmax(p_val, axis=1)
history['train_loss'].append(train_loss)
history['val_loss'].append(val_loss)
history['train_acc'].append(accuracy_score(y_train, train_pred))
history['val_acc'].append(accuracy_score(y_val, val_pred))
train_time = time.perf_counter() - t0
return model, history, train_time
numpy_model = MLPNumPy(seed=SEED)
numpy_model, numpy_hist, numpy_train_time = train_numpy_mlp(
numpy_model, X_train, y_train, X_val, y_val, epochs=10, batch_size=128, lr=1e-3
)
plot_curves(numpy_hist['train_loss'], numpy_hist['val_loss'], 'NumPy MLP - Loss', 'Cross-Entropy')
plot_curves(numpy_hist['train_acc'], numpy_hist['val_acc'], 'NumPy MLP - Accuracy', 'Accuracy')
# Evaluación NumPy en test + tiempo de inferencia
inf_t0 = time.perf_counter()
numpy_test_proba = numpy_model.forward(X_test)
numpy_test_pred = np.argmax(numpy_test_proba, axis=1)
numpy_infer_time = time.perf_counter() - inf_t0
numpy_metrics = classification_metrics(y_test, numpy_test_pred)
print('Métricas test NumPy:', numpy_metrics)
print(f'Tiempo entrenamiento NumPy: {numpy_train_time:.3f} s')
print(f'Tiempo inferencia NumPy (test completo): {numpy_infer_time:.4f} s')
fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, numpy_test_pred)).plot(ax=ax, cmap='Blues', colorbar=False)
plt.title('NumPy - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test NumPy: {'Accuracy': 0.8762, 'Precision_macro': 0.8744494706374877, 'Recall_macro': 0.8737371221120137, 'F1_macro': 0.873370849122888}
Tiempo entrenamiento NumPy: 4.242 s
Tiempo inferencia NumPy (test completo): 0.0167 s
4) MLP con TensorFlow (Keras)
Misma arquitectura, pero aprovechando API de alto nivel y optimizador Adam.
import tensorflow as tf
from tensorflow import keras
# Reproducibilidad básica
tf.random.set_seed(SEED)
keras_model = keras.Sequential([
keras.layers.Input(shape=(784,)),
keras.layers.Dense(256, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dense(10) # logits
])
keras_model.compile(
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
t0 = time.perf_counter()
keras_hist_obj = keras_model.fit(
X_train, y_train,
validation_data=(X_val, y_val),
epochs=10,
batch_size=128,
verbose=0
)
keras_train_time = time.perf_counter() - t0
keras_hist = {
'train_loss': keras_hist_obj.history['loss'],
'val_loss': keras_hist_obj.history['val_loss'],
'train_acc': keras_hist_obj.history['accuracy'],
'val_acc': keras_hist_obj.history['val_accuracy'],
}
plot_curves(keras_hist['train_loss'], keras_hist['val_loss'], 'Keras MLP - Loss', 'Cross-Entropy')
plot_curves(keras_hist['train_acc'], keras_hist['val_acc'], 'Keras MLP - Accuracy', 'Accuracy')
E0000 00:00:1773738663.564717 3172795 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected I0000 00:00:1773738663.564737 3172795 cuda_diagnostics.cc:160] env: CUDA_VISIBLE_DEVICES="-1" I0000 00:00:1773738663.564744 3172795 cuda_diagnostics.cc:163] CUDA_VISIBLE_DEVICES is set to -1 - this hides all GPUs from CUDA I0000 00:00:1773738663.564754 3172795 cuda_diagnostics.cc:171] verbose logging is disabled. Rerun with verbose logging (usually --v=1 or --vmodule=cuda_diagnostics=1) to get more diagnostic output from this module I0000 00:00:1773738663.564755 3172795 cuda_diagnostics.cc:176] retrieving CUDA diagnostic information for host: tnp01-4090 I0000 00:00:1773738663.564757 3172795 cuda_diagnostics.cc:183] hostname: tnp01-4090 I0000 00:00:1773738663.564870 3172795 cuda_diagnostics.cc:190] libcuda reported version is: 580.126.9 I0000 00:00:1773738663.564877 3172795 cuda_diagnostics.cc:194] kernel reported version is: 580.126.9 I0000 00:00:1773738663.564878 3172795 cuda_diagnostics.cc:284] kernel version seems to match DSO: 580.126.9
# Evaluación Keras + inferencia
inf_t0 = time.perf_counter()
keras_logits = keras_model.predict(X_test, verbose=0)
keras_test_pred = np.argmax(keras_logits, axis=1)
keras_infer_time = time.perf_counter() - inf_t0
keras_metrics = classification_metrics(y_test, keras_test_pred)
print('Métricas test Keras:', keras_metrics)
print(f'Tiempo entrenamiento Keras: {keras_train_time:.3f} s')
print(f'Tiempo inferencia Keras (test completo): {keras_infer_time:.4f} s')
fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, keras_test_pred)).plot(ax=ax, cmap='Greens', colorbar=False)
plt.title('Keras - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test Keras: {'Accuracy': 0.9773, 'Precision_macro': 0.977310036503291, 'Recall_macro': 0.9770520022309702, 'F1_macro': 0.9771264780798304}
Tiempo entrenamiento Keras: 7.964 s
Tiempo inferencia Keras (test completo): 0.1854 s
5) MLP con PyTorch
En PyTorch explicitamos arquitectura, DataLoader y bucle de entrenamiento.
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
torch.manual_seed(SEED)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Dispositivo:', device)
X_train_t = torch.tensor(X_train, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.long)
X_val_t = torch.tensor(X_val, dtype=torch.float32)
y_val_t = torch.tensor(y_val, dtype=torch.long)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_test_t = torch.tensor(y_test, dtype=torch.long)
train_loader = DataLoader(TensorDataset(X_train_t, y_train_t), batch_size=128, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val_t, y_val_t), batch_size=256, shuffle=False)
Dispositivo: cpu
class MLPTorch(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(784, 256), nn.ReLU(),
nn.Linear(256, 128), nn.ReLU(),
nn.Linear(128, 64), nn.ReLU(),
nn.Linear(64, 10)
)
def forward(self, x):
return self.net(x)
def train_torch(model, train_loader, val_loader, epochs=10, lr=1e-3):
model = model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
hist = {'train_loss': [], 'val_loss': [], 'train_acc': [], 'val_acc': []}
t0 = time.perf_counter()
for epoch in range(epochs):
# Train
model.train()
train_losses = []
train_correct, train_total = 0, 0
for xb, yb in train_loader:
xb, yb = xb.to(device), yb.to(device)
optimizer.zero_grad()
logits = model(xb)
loss = criterion(logits, yb)
loss.backward()
optimizer.step()
train_losses.append(loss.item())
pred = torch.argmax(logits, dim=1)
train_correct += (pred == yb).sum().item()
train_total += yb.size(0)
# Val
model.eval()
val_losses = []
val_correct, val_total = 0, 0
with torch.no_grad():
for xb, yb in val_loader:
xb, yb = xb.to(device), yb.to(device)
logits = model(xb)
loss = criterion(logits, yb)
val_losses.append(loss.item())
pred = torch.argmax(logits, dim=1)
val_correct += (pred == yb).sum().item()
val_total += yb.size(0)
hist['train_loss'].append(float(np.mean(train_losses)))
hist['val_loss'].append(float(np.mean(val_losses)))
hist['train_acc'].append(train_correct / train_total)
hist['val_acc'].append(val_correct / val_total)
train_time = time.perf_counter() - t0
return model, hist, train_time
torch_model = MLPTorch()
torch_model, torch_hist, torch_train_time = train_torch(torch_model, train_loader, val_loader, epochs=10, lr=1e-3)
plot_curves(torch_hist['train_loss'], torch_hist['val_loss'], 'PyTorch MLP - Loss', 'Cross-Entropy')
plot_curves(torch_hist['train_acc'], torch_hist['val_acc'], 'PyTorch MLP - Accuracy', 'Accuracy')
# Evaluación PyTorch + inferencia
torch_model.eval()
inf_t0 = time.perf_counter()
with torch.no_grad():
logits_test = torch_model(X_test_t.to(device)).cpu().numpy()
torch_test_pred = np.argmax(logits_test, axis=1)
torch_infer_time = time.perf_counter() - inf_t0
torch_metrics = classification_metrics(y_test, torch_test_pred)
print('Métricas test PyTorch:', torch_metrics)
print(f'Tiempo entrenamiento PyTorch: {torch_train_time:.3f} s')
print(f'Tiempo inferencia PyTorch (test completo): {torch_infer_time:.4f} s')
fig, ax = plt.subplots(figsize=(5,4))
ConfusionMatrixDisplay(confusion_matrix(y_test, torch_test_pred)).plot(ax=ax, cmap='Oranges', colorbar=False)
plt.title('PyTorch - Matriz de confusión')
plt.tight_layout()
plt.show()
Métricas test PyTorch: {'Accuracy': 0.9766, 'Precision_macro': 0.9765387654782114, 'Recall_macro': 0.9762661942254758, 'F1_macro': 0.9763238897317983}
Tiempo entrenamiento PyTorch: 6.118 s
Tiempo inferencia PyTorch (test completo): 0.0066 s
6) Comparación final de resultados y tiempos
results = pd.DataFrame([
{'Framework': 'NumPy', **numpy_metrics, 'Train_time_s': numpy_train_time, 'Inference_time_s': numpy_infer_time},
{'Framework': 'TensorFlow/Keras', **keras_metrics, 'Train_time_s': keras_train_time, 'Inference_time_s': keras_infer_time},
{'Framework': 'PyTorch', **torch_metrics, 'Train_time_s': torch_train_time, 'Inference_time_s': torch_infer_time},
])
results
| Framework | Accuracy | Precision_macro | Recall_macro | F1_macro | Train_time_s | Inference_time_s | |
|---|---|---|---|---|---|---|---|
| 0 | NumPy | 0.8762 | 0.874449 | 0.873737 | 0.873371 | 4.241594 | 0.016665 |
| 1 | TensorFlow/Keras | 0.9773 | 0.977310 | 0.977052 | 0.977126 | 7.964333 | 0.185445 |
| 2 | PyTorch | 0.9766 | 0.976539 | 0.976266 | 0.976324 | 6.117832 | 0.006642 |
fig, axes = plt.subplots(1, 3, figsize=(17, 4.5))
sns.barplot(data=results, x='Accuracy', y='Framework', ax=axes[0], palette='Blues')
axes[0].set_title('Accuracy test (mayor es mejor)')
sns.barplot(data=results, x='F1_macro', y='Framework', ax=axes[1], palette='Greens')
axes[1].set_title('F1 macro (mayor es mejor)')
sns.barplot(data=results, x='Train_time_s', y='Framework', ax=axes[2], palette='Reds_r')
axes[2].set_title('Tiempo de entrenamiento (s, menor es mejor)')
plt.tight_layout()
plt.show()
fig, ax = plt.subplots(figsize=(7, 4))
sns.barplot(data=results, x='Framework', y='Inference_time_s', palette='Purples', ax=ax)
ax.set_title('Tiempo de inferencia (test completo)')
ax.set_ylabel('Segundos')
ax.set_xlabel('')
plt.tight_layout()
plt.show()
7) Tests rápidos (sanity checks)
Verificaciones sencillas para asegurar que la comparación tiene resultados válidos.
# Métricas en rango y tiempos positivos
for col in ['Accuracy', 'Precision_macro', 'Recall_macro', 'F1_macro']:
assert np.isfinite(results[col]).all(), f'{col} contiene no finitos'
assert ((results[col] >= 0) & (results[col] <= 1)).all(), f'{col} fuera de [0,1]'
assert (results['Train_time_s'] > 0).all(), 'Train_time_s debe ser positivo'
assert (results['Inference_time_s'] > 0).all(), 'Inference_time_s debe ser positivo'
# Coherencia mínima de curvas
assert len(numpy_hist['train_loss']) == 10
assert len(keras_hist['train_loss']) == 10
assert len(torch_hist['train_loss']) == 10
print('✅ Sanity checks completados correctamente')
✅ Sanity checks completados correctamente
Conclusiones y siguientes pasos
Conclusiones clave
- El mismo MLP de 3 capas ocultas puede implementarse en todos los frameworks, pero con distinto nivel de abstracción.
- NumPy aporta comprensión profunda de forward/backward, aunque requiere más código y cuidado numérico.
- Keras simplifica muchísimo el entrenamiento y es ideal para prototipado rápido.
- PyTorch ofrece equilibrio entre control y productividad, especialmente útil en investigación.
- La comparación de tiempos y métricas debe hacerse en el mismo entorno/hardware, porque CPU/GPU y librerías influyen mucho.
Ideas para ampliar
- Probar regularización (Dropout, L2) y comparar sobreajuste.
- Añadir BatchNorm y estudiar estabilidad del entrenamiento.
- Ejecutar con GPU y medir diferencia de tiempo.
- Probar Fashion-MNIST o EMNIST para evaluar generalización.
- Repetir experimento variando tamaño del MLP y learning rate.
Idea final: dominar la misma arquitectura en distintos frameworks te ayuda a separar lo esencial del modelo de los detalles de implementación.