🏭 Caso de Uso

GRU TensorFlow/Keras — Clasificación de sentimiento (IMDB)

Clasificación binaria de reseñas IMDB con GRU bidireccional en TensorFlow/Keras: EDA, arquitectura, ROC-AUC y análisis de errores.

🐍 Python 📓 Jupyter Notebook

GRU en TensorFlow/Keras: clasificación de sentimientos en IMDb

En este notebook vamos a construir un ejemplo completo, de principio a fin, para aprender a usar una Gated Recurrent Unit (GRU) con tensorflow/keras en una tarea real de NLP: clasificación binaria de reseñas de cine (positiva/negativa) con el dataset IMDb.

Objetivo didáctico

Entender por qué GRU es una alternativa eficiente a una RNN vanilla y al LSTM.
Preparar datos de texto secuencial para una red recurrente.
Diseñar una arquitectura GRU razonable para clasificación.
Entrenar y diagnosticar el modelo con curvas de aprendizaje.
Evaluar con métricas y visualizaciones (matriz de confusión, ROC, ejemplos reales).
Comparar (de forma breve) el coste en parámetros de GRU vs LSTM equivalente.

Relación con la teoría del submódulo GRU

En la teoría vimos que GRU introduce dos puertas:

Update gate (z_t): controla cuánto del estado anterior (h_{t-1}) se conserva.
Reset gate (r_t): controla cuánto del pasado se usa al calcular el estado candidato.

Las ecuaciones típicas de una GRU son:

[ z_t = \sigma(W_z x_t + U_z h_{t-1} + b_z) ] [ r_t = \sigma(W_r x_t + U_r h_{t-1} + b_r) ] [ ilde{h}t = anh(W_h x_t + U_h (r_t \odot h{t-1}) + b_h) ] [ h_t = (1-z_t) \odot h_{t-1} + z_t \odot ilde{h}_t ]

Interpretación práctica:

Si (z_t) es pequeño, el modelo retiene memoria previa.
Si (z_t) es grande, el modelo actualiza más con la información nueva.
(r_t) permite “olvidar selectivamente” pasado irrelevante al construir ( ilde{h}_t).

Esto reduce el problema de vanishing gradient respecto a RNN simple, con menos complejidad que LSTM.

Dataset y modelo que usaremos

Dataset: keras.datasets.imdb (reseñas ya tokenizadas por frecuencia).
Problema: clasificación binaria (0 = negativa, 1 = positiva).
Arquitectura principal:
- Embedding para convertir índices a vectores densos.
- Bidirectional(GRU) + GRU para capturar contexto secuencial.
- Capas densas para decisión final.
Métrica principal: accuracy (acompañada de precision, recall, AUC y F1 fuera de Keras).

Nota: al usar texto tokenizado con padding/truncado, mantenemos un pipeline claro y reproducible para enseñar la mecánica de GRU.

[1]

# ==============================
# 1) Imports y configuración
# ==============================

import os
# forzar ejecución en CPU (evita errores del autotuner XLA/Triton en GPU)
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

import random
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.metrics import (
    confusion_matrix,
    classification_report,
    roc_curve,
    auc,
    precision_recall_fscore_support,
)

import tensorflow as tf
from tensorflow import keras

# Semillas para reproducibilidad (hasta donde permite el backend).
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

print('TensorFlow version:', tf.__version__)

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773756322.510172 3563323 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1773756322.537543 3563323 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

TensorFlow version: 2.21.0

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1773756323.261524 3563323 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.

1) Carga del dataset IMDb

Vamos a limitar el vocabulario a las palabras más frecuentes (ejemplo clásico: top_words=10000). Después haremos padding para que todas las secuencias tengan la misma longitud y puedan entrenarse en lote.

[2]

# ==============================
# 2) Carga de IMDb
# ==============================

TOP_WORDS = 10_000
MAX_LEN = 200

# Cargamos dataset ya tokenizado (enteros).
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=TOP_WORDS)

print('Número de muestras train:', len(x_train))
print('Número de muestras test :', len(x_test))
print('Ejemplo secuencia (primeros 20 tokens):', x_train[0][:20])
print('Etiqueta ejemplo:', y_train[0])

Número de muestras train: 25000
Número de muestras test : 25000
Ejemplo secuencia (primeros 20 tokens): [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25]
Etiqueta ejemplo: 1

2) Mini EDA (Exploratory Data Analysis)

Aunque sea un dataset de juguete para docencia, hacemos un EDA breve y útil:

Balance de clases.
Distribución de longitudes de reseñas.
Reconstrucción aproximada de texto para inspección humana.

Este paso ayuda a justificar decisiones como MAX_LEN, padding='post' o la métrica principal.

[3]

# ==============================
# 3) EDA - Balance de clases
# ==============================

unique, counts = np.unique(y_train, return_counts=True)
class_dist = dict(zip(unique, counts))
print('Distribución de clases (train):', class_dist)

plt.figure(figsize=(5, 3))
plt.bar(['Negativa (0)', 'Positiva (1)'], [class_dist.get(0, 0), class_dist.get(1, 0)])
plt.title('Balance de clases en train')
plt.ylabel('Número de reseñas')
plt.show()

Distribución de clases (train): {0: 12500, 1: 12500}

[4]

# ==============================
# 4) EDA - Longitudes de secuencia
# ==============================

train_lengths = [len(seq) for seq in x_train]

print('Longitud media:', np.mean(train_lengths).round(2))
print('Mediana      :', np.median(train_lengths))
print('P90          :', np.percentile(train_lengths, 90).round(2))
print('Máxima       :', np.max(train_lengths))

plt.figure(figsize=(7, 4))
plt.hist(train_lengths, bins=50, color='#E17055', edgecolor='black', alpha=0.8)
plt.axvline(MAX_LEN, color='blue', linestyle='--', label=f'MAX_LEN={MAX_LEN}')
plt.title('Distribución de longitud de reseñas (tokens)')
plt.xlabel('Longitud')
plt.ylabel('Frecuencia')
plt.legend()
plt.show()

Longitud media: 238.71
Mediana      : 178.0
P90          : 467.0
Máxima       : 2494

[5]

# ==============================
# 5) EDA - Decodificar una reseña
# ==============================

# IMDb incluye un mapeo palabra -> índice. Lo invertimos para inspeccionar texto.
word_index = keras.datasets.imdb.get_word_index()
index_to_word = {idx + 3: word for word, idx in word_index.items()}
index_to_word[0] = '<PAD>'
index_to_word[1] = '<START>'
index_to_word[2] = '<UNK>'
index_to_word[3] = '<UNUSED>'

# Reconstruimos parcialmente una reseña de ejemplo.
example_decoded = ' '.join(index_to_word.get(token, '?') for token in x_train[0][:80])
print('Reseña reconstruida (fragmento):')
print(example_decoded)
print('Etiqueta real:', y_train[0], '(1=positiva, 0=negativa)')

Reseña reconstruida (fragmento):
<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as
Etiqueta real: 1 (1=positiva, 0=negativa)

3) Preprocesamiento final para la red GRU

Aplicamos padding/truncado para transformar listas de longitudes variables en una matriz (batch, time_steps). Esto facilita el uso de Embedding + GRU en Keras.

[6]

# ==============================
# 6) Padding y tipos
# ==============================

x_train_pad = keras.preprocessing.sequence.pad_sequences(
    x_train,
    maxlen=MAX_LEN,
    padding='post',
    truncating='post'
)

x_test_pad = keras.preprocessing.sequence.pad_sequences(
    x_test,
    maxlen=MAX_LEN,
    padding='post',
    truncating='post'
)

y_train = np.array(y_train, dtype=np.int32)
y_test = np.array(y_test, dtype=np.int32)

print('Shape x_train_pad:', x_train_pad.shape)
print('Shape x_test_pad :', x_test_pad.shape)

Shape x_train_pad: (25000, 200)
Shape x_test_pad : (25000, 200)

4) Definición de la arquitectura GRU

Diseño propuesto (equilibrado para docencia):

Embedding(TOP_WORDS, EMBED_DIM)
Bidirectional(GRU(..., return_sequences=True))
GRU(...) final para condensar secuencia.
Dense + Dropout
Dense(1, sigmoid) para clasificación binaria.

Además activamos métricas de entrenamiento relevantes (accuracy, Precision, Recall, AUC).

[7]

# ==============================
# 7) Modelo GRU en Keras
# ==============================

EMBED_DIM = 128
GRU_UNITS_1 = 64
GRU_UNITS_2 = 32

model_gru = keras.Sequential([
    keras.layers.Embedding(input_dim=TOP_WORDS, output_dim=EMBED_DIM, input_length=MAX_LEN),
    keras.layers.Bidirectional(
        keras.layers.GRU(GRU_UNITS_1, return_sequences=True, dropout=0.2)
    ),
    keras.layers.GRU(GRU_UNITS_2, dropout=0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(1, activation='sigmoid')
])

model_gru.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss='binary_crossentropy',
    metrics=[
        'accuracy',
        keras.metrics.Precision(name='precision'),
        keras.metrics.Recall(name='recall'),
        keras.metrics.AUC(name='auc')
    ]
)

model_gru.summary()

/home/nuberu/xuan/naux/.venv/lib/python3.10/site-packages/keras/src/layers/core/embedding.py:97: UserWarning: Argument `input_length` is deprecated. Just remove it.
  warnings.warn(
E0000 00:00:1773756325.132529 3563323 cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
I0000 00:00:1773756325.132559 3563323 cuda_diagnostics.cc:160] env: CUDA_VISIBLE_DEVICES="-1"
I0000 00:00:1773756325.132564 3563323 cuda_diagnostics.cc:163] CUDA_VISIBLE_DEVICES is set to -1 - this hides all GPUs from CUDA
I0000 00:00:1773756325.132569 3563323 cuda_diagnostics.cc:171] verbose logging is disabled. Rerun with verbose logging (usually --v=1 or --vmodule=cuda_diagnostics=1) to get more diagnostic output from this module
I0000 00:00:1773756325.132570 3563323 cuda_diagnostics.cc:176] retrieving CUDA diagnostic information for host: tnp01-4090
I0000 00:00:1773756325.132572 3563323 cuda_diagnostics.cc:183] hostname: tnp01-4090
I0000 00:00:1773756325.132682 3563323 cuda_diagnostics.cc:190] libcuda reported version is: 580.126.9
I0000 00:00:1773756325.132689 3563323 cuda_diagnostics.cc:194] kernel reported version is: 580.126.9
I0000 00:00:1773756325.132690 3563323 cuda_diagnostics.cc:284] kernel version seems to match DSO: 580.126.9

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ embedding (Embedding)           │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ bidirectional (Bidirectional)   │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ gru_1 (GRU)                     │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ ?                      │   0 (unbuilt) │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ ?                      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ ?                      │   0 (unbuilt) │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 0 (0.00 B)

 Trainable params: 0 (0.00 B)

 Non-trainable params: 0 (0.00 B)

(Opcional didáctico) Comparación rápida de parámetros: GRU vs LSTM

Con el mismo tamaño de capas, GRU suele tener menos parámetros que LSTM (en torno a ~25% menos en la parte recurrente).

[8]

# ==============================
# 8) Comparación de parámetros con LSTM equivalente
# ==============================

model_lstm_ref = keras.Sequential([
    keras.layers.Embedding(input_dim=TOP_WORDS, output_dim=EMBED_DIM, input_length=MAX_LEN),
    keras.layers.Bidirectional(
        keras.layers.LSTM(GRU_UNITS_1, return_sequences=True, dropout=0.2)
    ),
    keras.layers.LSTM(GRU_UNITS_2, dropout=0.2),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(1, activation='sigmoid')
])

model_gru.build(input_shape=(None, MAX_LEN))
model_lstm_ref.build(input_shape=(None, MAX_LEN))

gru_params = model_gru.count_params()
lstm_params = model_lstm_ref.count_params()

print(f'Parámetros GRU : {gru_params:,}')
print(f'Parámetros LSTM: {lstm_params:,}')
print(f'Reducción relativa aprox. con GRU: {(1 - gru_params/lstm_params)*100:.2f}%')

Parámetros GRU : 1,371,137
Parámetros LSTM: 1,400,513
Reducción relativa aprox. con GRU: 2.10%

5) Entrenamiento

Usamos:

validation_split para monitorizar generalización.
EarlyStopping para detener cuando la validación se estanca.
ReduceLROnPlateau para ajustar learning rate automáticamente.

[9]

# ==============================
# 9) Callbacks y entrenamiento
# ==============================

BATCH_SIZE = 128
EPOCHS = 8

callbacks = [
    keras.callbacks.EarlyStopping(monitor='val_loss', patience=2, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=1, min_lr=1e-5, verbose=1
    )
]

history = model_gru.fit(
    x_train_pad,
    y_train,
    validation_split=0.2,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    callbacks=callbacks,
    verbose=1
)

Epoch 1/8
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 109ms/step - accuracy: 0.5182 - auc: 0.5249 - loss: 0.6920 - precision: 0.5202 - recall: 0.5067 - val_accuracy: 0.5420 - val_auc: 0.5893 - val_loss: 0.6835 - val_precision: 0.6446 - val_recall: 0.1616 - learning_rate: 0.0010
Epoch 2/8
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 103ms/step - accuracy: 0.5544 - auc: 0.5905 - loss: 0.6775 - precision: 0.5642 - recall: 0.4905 - val_accuracy: 0.5722 - val_auc: 0.6398 - val_loss: 0.6813 - val_precision: 0.6571 - val_recall: 0.2795 - learning_rate: 0.0010
Epoch 3/8
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 102ms/step - accuracy: 0.8036 - auc: 0.8789 - loss: 0.4388 - precision: 0.8044 - recall: 0.8038 - val_accuracy: 0.8632 - val_auc: 0.9370 - val_loss: 0.3213 - val_precision: 0.8612 - val_recall: 0.8619 - learning_rate: 0.0010
Epoch 4/8
[1m156/157[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 95ms/step - accuracy: 0.8978 - auc: 0.9565 - loss: 0.2641 - precision: 0.8980 - recall: 0.8973
Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257.
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 103ms/step - accuracy: 0.9096 - auc: 0.9641 - loss: 0.2391 - precision: 0.9096 - recall: 0.9101 - val_accuracy: 0.8650 - val_auc: 0.9414 - val_loss: 0.3549 - val_precision: 0.8264 - val_recall: 0.9198 - learning_rate: 0.0010
Epoch 5/8
[1m156/157[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 98ms/step - accuracy: 0.9407 - auc: 0.9805 - loss: 0.1686 - precision: 0.9372 - recall: 0.9448
Epoch 5: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628.
[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 106ms/step - accuracy: 0.9517 - auc: 0.9851 - loss: 0.1435 - precision: 0.9508 - recall: 0.9529 - val_accuracy: 0.8540 - val_auc: 0.9367 - val_loss: 0.4297 - val_precision: 0.9009 - val_recall: 0.7914 - learning_rate: 5.0000e-04

6) Curvas de entrenamiento (loss/accuracy)

Como en cualquier flujo serio, revisamos curvas de train y val para detectar sobreajuste, infraajuste y convergencia.

[10]

# ==============================
# 10) Curvas de entrenamiento
# ==============================

hist = history.history
epochs_r = range(1, len(hist['loss']) + 1)

plt.figure(figsize=(13, 4))

plt.subplot(1, 2, 1)
plt.plot(epochs_r, hist['loss'], marker='o', label='Train loss')
plt.plot(epochs_r, hist['val_loss'], marker='o', label='Val loss')
plt.title('Loss vs Epoch')
plt.xlabel('Epoch')
plt.ylabel('Binary crossentropy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(epochs_r, hist['accuracy'], marker='o', label='Train acc')
plt.plot(epochs_r, hist['val_accuracy'], marker='o', label='Val acc')
plt.title('Accuracy vs Epoch')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

7) Evaluación en test

Evaluamos en el conjunto de test para tener una estimación honesta del rendimiento final.

[11]

# ==============================
# 11) Evaluación final en test
# ==============================

test_metrics = model_gru.evaluate(x_test_pad, y_test, verbose=0)
metric_names = model_gru.metrics_names

print('Métricas en test:')
for name, value in zip(metric_names, test_metrics):
    print(f'  {name:>10s}: {value:.4f}')

Métricas en test:
        loss: 0.3384
  compile_metrics: 0.8515

8) Métricas detalladas y visualizaciones

Además de accuracy, mostramos matriz de confusión, informe de clasificación, curva ROC y F1-score.

[12]

# ==============================
# 12) Predicciones y métricas detalladas
# ==============================

y_prob = model_gru.predict(x_test_pad, verbose=0).ravel()
y_pred = (y_prob >= 0.5).astype(int)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title('Matriz de confusión (test)')
plt.xlabel('Predicción')
plt.ylabel('Real')
plt.show()

print(classification_report(y_test, y_pred, digits=4))

precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average='binary')
print(f'Precision: {precision:.4f}')
print(f'Recall   : {recall:.4f}')
print(f'F1-score : {f1:.4f}')

              precision    recall  f1-score   support

           0     0.8441    0.8622    0.8531     12500
           1     0.8592    0.8407    0.8499     12500

    accuracy                         0.8515     25000
   macro avg     0.8516    0.8515    0.8515     25000
weighted avg     0.8516    0.8515    0.8515     25000

Precision: 0.8592
Recall   : 0.8407
F1-score : 0.8499

[13]

# ==============================
# 13) Curva ROC
# ==============================

fpr, tpr, _ = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, label=f'ROC (AUC={roc_auc:.4f})')
plt.plot([0, 1], [0, 1], linestyle='--', label='Azar')
plt.title('Curva ROC - Modelo GRU')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend()
plt.grid(alpha=0.3)
plt.show()

9) Inspección cualitativa de predicciones

Ver ejemplos concretos ayuda a entender qué tipo de errores comete el modelo y qué confianza asigna.

[14]

# ==============================
# 14) Ejemplos de predicción
# ==============================

def decode_review(token_ids, index_to_word_map, max_tokens=60):
    # Decodifica una secuencia de índices a texto legible (recortado).
    return ' '.join(index_to_word_map.get(t, '?') for t in token_ids[:max_tokens])

sample_idx = np.random.choice(len(x_test), size=5, replace=False)

for i, idx in enumerate(sample_idx, start=1):
    prob = y_prob[idx]
    pred = int(prob >= 0.5)
    real = int(y_test[idx])
    text_preview = decode_review(x_test[idx], index_to_word)

    print(f'--- Ejemplo {i} ---')
    print(f'Real={real} | Pred={pred} | Prob positiva={prob:.4f}')
    print('Texto (fragmento):', text_preview)
    print()

--- Ejemplo 1 ---
Real=1 | Pred=1 | Prob positiva=0.9695
Texto (fragmento): <START> to tell you the truth i do not speak <UNK> and i did not understand the film my good <UNK> friend wow what a long name explained every thing to me what a great movie after watching this movie i felt i should have watched many more movies from <UNK> <UNK> film industry the war scenes were amazing camera

--- Ejemplo 2 ---
Real=1 | Pred=1 | Prob positiva=0.9430
Texto (fragmento): <START> the royal <UNK> has <UNK> been one of my favourite events and i've been a wrestling fan for a good few years now the other shows may have better matches but i've always found the actual <UNK> match to be full of excitement br br i'm not going to reveal the winners of any match as i don't see

--- Ejemplo 3 ---
Real=1 | Pred=1 | Prob positiva=0.8714
Texto (fragmento): <START> i really wanted to be able to give this film a 10 i've long thought it was my favorite of the four modern live action batman films to date and maybe it still will be i have yet to watch the <UNK> films again i'm also starting to become concerned about whether i'm somehow <UNK> being you see i

--- Ejemplo 4 ---
Real=1 | Pred=0 | Prob positiva=0.0948
Texto (fragmento): <START> nothing dull about this movie which is held together by fully realized characters with some depth to them even the <UNK> <UNK> have body language <UNK> performance is brilliant all will want and need a henry <UNK> as he must have been <UNK> is maybe <UNK> and <UNK> than anne <UNK> but she plays the part as written a

--- Ejemplo 5 ---
Real=0 | Pred=0 | Prob positiva=0.0182
Texto (fragmento): <START> this is a movie about a black man buying a <UNK> company and turning the company into a african <UNK> over the top <UNK> they even portray the owner as not only being in control of the <UNK> but also controlling part of the air <UNK> at the airport one day this guy wins 100 million dollars a the

Conclusiones

En este notebook construimos un flujo completo de GRU para NLP:

Preprocesamiento de secuencias con padding/truncado.
Arquitectura recurrente moderna con Embedding + (Bi)GRU.
Entrenamiento con callbacks para estabilidad.
Evaluación cuantitativa y cualitativa.

Ideas clave aprendidas:

GRU ofrece una buena relación entre capacidad y coste computacional.
Las curvas train/val son imprescindibles para diagnosticar comportamiento.
Accuracy sola no basta: conviene revisar precision, recall, F1 y AUC.

Sugerencias para seguir explorando

Probar distintas longitudes (MAX_LEN=100, 300, 500) y comparar.
Cambiar número de unidades GRU y dropout.
Sustituir GRU por LSTM con misma arquitectura y medir tiempo/accuracy.
Añadir embeddings preentrenados (GloVe/FastText).
Ajustar el umbral de decisión (no siempre 0.5) según objetivos de negocio.
Probar modelos híbridos Conv1D + GRU para capturar patrones locales + contexto largo.

Mensaje final: GRU es una excelente primera opción en muchos problemas secuenciales cuando quieres rendimiento sólido con entrenamiento relativamente eficiente.