Skip to content

Latest commit

 

History

History
425 lines (322 loc) · 13.3 KB

File metadata and controls

425 lines (322 loc) · 13.3 KB

Explaining the Embedding Layer in Python

Slide 1: Understanding Embedding Layers

An embedding layer is a crucial component in many deep learning models, particularly in natural language processing tasks. It transforms discrete input data, such as words or categorical variables, into dense vectors of fixed size. These vectors capture semantic relationships and help neural networks process input more effectively.

import torch
import torch.nn as nn

# Creating an embedding layer
vocab_size = 10000
embedding_dim = 100
embedding_layer = nn.Embedding(vocab_size, embedding_dim)

# Input: tensor of word indices
input_indices = torch.tensor([1, 5, 3, 2])

# Get embeddings
embeddings = embedding_layer(input_indices)
print(embeddings.shape)  # Output: torch.Size([4, 100])

Slide 2: Initialization of Embedding Weights

When an embedding layer is created, its weights are typically initialized randomly. These weights are then updated during the training process to learn meaningful representations of the input data.

import torch.nn as nn
import matplotlib.pyplot as plt

# Create an embedding layer
vocab_size = 1000
embedding_dim = 50
embedding = nn.Embedding(vocab_size, embedding_dim)

# Visualize initial weights
plt.figure(figsize=(10, 5))
plt.imshow(embedding.weight.data[:100].numpy(), cmap='viridis')
plt.colorbar()
plt.title("Initial Embedding Weights")
plt.xlabel("Embedding Dimension")
plt.ylabel("Word Index")
plt.show()

Slide 3: Word-to-Vector Conversion

The embedding layer maps each word (represented by its index) to a dense vector. This process transforms sparse, one-hot encoded inputs into dense, continuous vectors that capture semantic meaning.

import torch
import torch.nn as nn

# Create a simple vocabulary
vocab = {"apple": 0, "banana": 1, "cherry": 2}
embedding_dim = 5

# Create an embedding layer
embedding = nn.Embedding(len(vocab), embedding_dim)

# Convert words to vectors
word = "banana"
word_idx = torch.tensor([vocab[word]])
word_vector = embedding(word_idx)

print(f"Vector for '{word}': {word_vector.squeeze().tolist()}")

Slide 4: Handling Out-of-Vocabulary Words

Embedding layers often include a special token for out-of-vocabulary (OOV) words. This allows the model to handle unseen words during inference.

import torch
import torch.nn as nn

vocab = {"<OOV>": 0, "cat": 1, "dog": 2, "fish": 3}
embedding_dim = 4

embedding = nn.Embedding(len(vocab), embedding_dim)

def get_word_vector(word):
    idx = vocab.get(word, 0)  # Use OOV index if word not in vocab
    return embedding(torch.tensor([idx])).squeeze()

print("Known word:", get_word_vector("cat").tolist())
print("OOV word:", get_word_vector("elephant").tolist())

Slide 5: Embedding Layer in a Neural Network

Embedding layers are often used as the first layer in neural networks for NLP tasks. They convert input words or tokens into dense vectors that subsequent layers can process.

import torch
import torch.nn as nn

class SimpleTextClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, num_classes):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.fc = nn.Linear(embedding_dim, num_classes)
    
    def forward(self, x):
        embedded = self.embedding(x)
        pooled = torch.mean(embedded, dim=1)
        return self.fc(pooled)

# Example usage
vocab_size, embedding_dim, num_classes = 10000, 100, 5
model = SimpleTextClassifier(vocab_size, embedding_dim, num_classes)
input_ids = torch.randint(0, vocab_size, (32, 20))  # Batch of 32 sentences, each with 20 words
output = model(input_ids)
print("Output shape:", output.shape)

Slide 6: Pretrained Embeddings

Instead of learning embeddings from scratch, we can use pretrained embeddings like Word2Vec or GloVe. These embeddings capture rich semantic information learned from large corpora.

import torch
import torch.nn as nn
import numpy as np

# Load pretrained embeddings (simplified example)
pretrained_embeddings = np.random.randn(10000, 100)  # Simulate loaded embeddings

# Create embedding layer with pretrained weights
vocab_size, embedding_dim = pretrained_embeddings.shape
embedding = nn.Embedding.from_pretrained(torch.FloatTensor(pretrained_embeddings))

# Use the embedding layer
word_idx = torch.tensor([42])  # Example word index
word_vector = embedding(word_idx)
print("Pretrained vector:", word_vector.squeeze().tolist()[:5])  # Show first 5 dimensions

Slide 7: Visualizing Word Embeddings

After training, we can visualize word embeddings to understand the relationships between words in the embedding space. Techniques like t-SNE or PCA help reduce the high-dimensional embeddings for visualization.

import torch
import torch.nn as nn
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Create sample embeddings
vocab = {"cat": 0, "dog": 1, "fish": 2, "bird": 3, "lion": 4}
embedding = nn.Embedding(len(vocab), 50)

# Get embeddings as numpy array
embeddings_np = embedding.weight.detach().numpy()

# Reduce dimensionality with t-SNE
tsne = TSNE(n_components=2, random_state=42)
reduced_embeddings = tsne.fit_transform(embeddings_np)

# Plot
plt.figure(figsize=(10, 8))
for word, idx in vocab.items():
    x, y = reduced_embeddings[idx]
    plt.scatter(x, y)
    plt.annotate(word, (x, y))
plt.title("2D Visualization of Word Embeddings")
plt.show()

Slide 8: Embedding Lookup Speed

Embedding layers perform fast lookups, making them efficient for processing large vocabularies. Let's compare the speed of embedding lookup to a naive dictionary approach.

import torch
import torch.nn as nn
import time

vocab_size, embedding_dim = 100000, 300
embedding = nn.Embedding(vocab_size, embedding_dim)

# Embedding lookup
start_time = time.time()
indices = torch.randint(0, vocab_size, (10000,))
_ = embedding(indices)
embed_time = time.time() - start_time

# Dictionary lookup (naive approach)
embed_dict = {i: torch.randn(embedding_dim) for i in range(vocab_size)}
start_time = time.time()
_ = [embed_dict[i.item()] for i in indices]
dict_time = time.time() - start_time

print(f"Embedding lookup time: {embed_time:.4f}s")
print(f"Dictionary lookup time: {dict_time:.4f}s")

Slide 9: Handling Variable-Length Sequences

In practice, we often deal with sequences of different lengths. Embedding layers can handle this by using padding and masking.

import torch
import torch.nn as nn
import torch.nn.functional as F

# Create embedding layer
vocab_size, embedding_dim = 1000, 50
embedding = nn.Embedding(vocab_size, embedding_dim)

# Sample sequences of different lengths
sequences = [
    torch.randint(0, vocab_size, (5,)),
    torch.randint(0, vocab_size, (3,)),
    torch.randint(0, vocab_size, (7,))
]

# Pad sequences
padded_sequences = nn.utils.rnn.pad_sequence(sequences, batch_first=True)

# Create mask
mask = (padded_sequences != 0).float()

# Embed padded sequences
embedded = embedding(padded_sequences)

# Apply mask
masked_embedded = embedded * mask.unsqueeze(-1)

print("Padded shape:", padded_sequences.shape)
print("Embedded shape:", embedded.shape)
print("Masked embedded shape:", masked_embedded.shape)

Slide 10: Real-Life Example: Sentiment Analysis

Let's use an embedding layer in a simple sentiment analysis model for movie reviews.

import torch
import torch.nn as nn
import torch.optim as optim

# Simplified dataset
reviews = ["great movie", "terrible film", "loved it"]
sentiments = [1, 0, 1]  # 1 for positive, 0 for negative

# Create vocabulary
vocab = set(" ".join(reviews).split())
word_to_idx = {word: i for i, word in enumerate(vocab)}

# Model definition
class SentimentClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.fc = nn.Linear(embedding_dim, 1)
    
    def forward(self, x):
        embedded = self.embedding(x)
        pooled = torch.mean(embedded, dim=1)
        return torch.sigmoid(self.fc(pooled))

# Prepare data
X = [[word_to_idx[word] for word in review.split()] for review in reviews]
X = nn.utils.rnn.pad_sequence([torch.tensor(x) for x in X], batch_first=True)
y = torch.tensor(sentiments, dtype=torch.float32)

# Train model
model = SentimentClassifier(len(vocab), 10)
optimizer = optim.Adam(model.parameters())
criterion = nn.BCELoss()

for epoch in range(100):
    optimizer.zero_grad()
    output = model(X).squeeze()
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

print(f"Final loss: {loss.item():.4f}")

Slide 11: Real-Life Example: Text Generation

In this example, we'll use an embedding layer in a simple character-level language model for text generation.

import torch
import torch.nn as nn
import torch.optim as optim

# Sample text
text = "Hello, world! This is a simple example of text generation."
chars = sorted(set(text))
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for c, i in char_to_idx.items()}

# Prepare data
sequence_length = 10
X = torch.tensor([[char_to_idx[c] for c in text[i:i+sequence_length]] for i in range(len(text)-sequence_length)])
y = torch.tensor([char_to_idx[text[i+sequence_length]] for i in range(len(text)-sequence_length)])

# Model definition
class CharLM(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)
    
    def forward(self, x):
        embedded = self.embedding(x)
        output, _ = self.lstm(embedded)
        return self.fc(output[:, -1, :])

# Train model
model = CharLM(len(chars), 10, 20)
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

for epoch in range(100):
    optimizer.zero_grad()
    output = model(X)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

print(f"Final loss: {loss.item():.4f}")

# Generate text
start_sequence = "Hello, "
generated = start_sequence
for _ in range(20):
    x = torch.tensor([[char_to_idx[c] for c in generated[-sequence_length:]]])
    output = model(x)
    next_char_idx = torch.argmax(output, dim=1).item()
    generated += idx_to_char[next_char_idx]

print("Generated text:", generated)

Slide 12: Embedding Layer Memory Usage

Embedding layers can consume significant memory, especially for large vocabularies. Let's examine the memory usage and discuss strategies to reduce it.

import torch
import torch.nn as nn
import sys

def get_size(obj, seen=None):
    """Recursively calculate size of objects"""
    size = sys.getsizeof(obj)
    if seen is None:
        seen = set()
    obj_id = id(obj)
    if obj_id in seen:
        return 0
    seen.add(obj_id)
    if isinstance(obj, dict):
        size += sum([get_size(v, seen) for v in obj.values()])
        size += sum([get_size(k, seen) for k in obj.keys()])
    elif hasattr(obj, '__dict__'):
        size += get_size(obj.__dict__, seen)
    return size

# Create embedding layers with different sizes
vocab_sizes = [1000, 10000, 100000]
embedding_dim = 300

for vocab_size in vocab_sizes:
    embedding = nn.Embedding(vocab_size, embedding_dim)
    size_mb = get_size(embedding) / (1024 * 1024)
    print(f"Vocab size: {vocab_size}, Embedding size: {size_mb:.2f} MB")

# Memory reduction technique: use smaller data type
embedding_fp16 = nn.Embedding(100000, 300).half()
size_mb_fp16 = get_size(embedding_fp16) / (1024 * 1024)
print(f"FP16 Embedding size: {size_mb_fp16:.2f} MB")

Slide 13: Embedding Layer in Transfer Learning

Embedding layers play a crucial role in transfer learning for NLP tasks. We can use pretrained embeddings and fine-tune them for specific tasks.

import torch
import torch.nn as nn

# Simulated pretrained embeddings
pretrained_embeddings = torch.randn(10000, 300)

class TransferLearningModel(nn.Module):
    def __init__(self, pretrained_embeddings, num_classes, freeze_embeddings=True):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings, freeze=freeze_embeddings)
        self.lstm = nn.LSTM(300, 128, batch_first=True)
        self.fc = nn.Linear(128, num_classes)
    
    def forward(self, x):
        embedded = self.embedding(x)
        _, (hidden, _) = self.lstm(embedded)
        return self.fc(hidden.squeeze(0))

# Create model
model = TransferLearningModel(pretrained_embeddings, num_classes=5)

# Example usage
input_ids = torch.randint(0, 10000, (32, 20))  # Batch of 32 sentences, each with 20 words
output = model(input_ids)
print("Output shape:", output.shape)

# Fine-tuning: unfreeze embeddings
model.embedding.weight.requires_grad = True

Slide 14: Additional Resources

For those interested in diving deeper into embedding layers and their applications in natural language processing, here are some valuable resources:

  1. "Word2Vec Tutorial - The Skip-Gram Model" by Chris McCormick An excellent introduction to the concepts behind word embeddings.
  2. "GloVe: Global Vectors for Word Representation" by Pennington et al. ArXiv link: https://arxiv.org/abs/1405.3531 This paper introduces the GloVe algorithm for learning word representations.
  3. "Efficient Estimation of Word Representations in Vector Space" by Mikolov et al. ArXiv link: https://arxiv.org/abs/