Skip to content

Latest commit

 

History

History
432 lines (341 loc) · 16.7 KB

File metadata and controls

432 lines (341 loc) · 16.7 KB

Building a Python Language Translator

Slide 1: Introduction to Language Translation in Python

Language translation is a complex task that involves converting text from one language to another while preserving meaning and context. In Python, we can create a basic language translator using various approaches. This slideshow will guide you through the process of building a simple translator from scratch, focusing on key concepts and techniques.

# A simple example of what we'll achieve by the end of this slideshow
def simple_translator(text, source_lang, target_lang):
    # Placeholder for translation logic
    translated_text = f"Translated '{text}' from {source_lang} to {target_lang}"
    return translated_text

print(simple_translator("Hello", "English", "Spanish"))

Slide 2: Setting Up the Project Structure

To create our translator, we'll start by setting up a basic project structure. This includes creating a main Python file and organizing our code into functions. We'll also create a simple dictionary to store our translations.

# main.py

# Dictionary to store our translations
translations = {
    'en': {
        'hello': {'es': 'hola', 'fr': 'bonjour'},
        'goodbye': {'es': 'adiós', 'fr': 'au revoir'}
    }
}

def translate_word(word, source_lang, target_lang):
    if source_lang in translations and word in translations[source_lang]:
        return translations[source_lang][word].get(target_lang, word)
    return word

# Test the function
print(translate_word('hello', 'en', 'es'))  # Output: hola
print(translate_word('goodbye', 'en', 'fr'))  # Output: au revoir

Slide 3: Implementing Basic Word Translation

Now that we have our project structure, let's implement a basic word translation function. This function will take a word, source language, and target language as input, and return the translated word if it exists in our dictionary.

def translate_word(word, source_lang, target_lang):
    word = word.lower()  # Convert to lowercase for consistency
    if source_lang in translations and word in translations[source_lang]:
        return translations[source_lang][word].get(target_lang, word)
    return word

# Test the function with different cases
print(translate_word('Hello', 'en', 'es'))  # Output: hola
print(translate_word('GOODBYE', 'en', 'fr'))  # Output: au revoir
print(translate_word('cat', 'en', 'es'))  # Output: cat (not in dictionary)

Slide 4: Expanding the Translation Dictionary

To make our translator more useful, we need to expand our translation dictionary. Let's add more words and languages to our translations dictionary.

translations = {
    'en': {
        'hello': {'es': 'hola', 'fr': 'bonjour', 'de': 'hallo'},
        'goodbye': {'es': 'adiós', 'fr': 'au revoir', 'de': 'auf wiedersehen'},
        'cat': {'es': 'gato', 'fr': 'chat', 'de': 'katze'},
        'dog': {'es': 'perro', 'fr': 'chien', 'de': 'hund'}
    },
    'es': {
        'hola': {'en': 'hello', 'fr': 'bonjour', 'de': 'hallo'},
        'adiós': {'en': 'goodbye', 'fr': 'au revoir', 'de': 'auf wiedersehen'}
    }
}

# Test with new translations
print(translate_word('cat', 'en', 'de'))  # Output: katze
print(translate_word('hola', 'es', 'en'))  # Output: hello

Slide 5: Implementing Sentence Translation

Now that we can translate individual words, let's create a function to translate entire sentences. This function will split the sentence into words, translate each word, and then join them back together.

def translate_sentence(sentence, source_lang, target_lang):
    words = sentence.split()
    translated_words = [translate_word(word, source_lang, target_lang) for word in words]
    return ' '.join(translated_words)

# Test the sentence translation
print(translate_sentence("Hello goodbye", 'en', 'es'))  # Output: hola adiós
print(translate_sentence("The cat and dog", 'en', 'fr'))  # Output: The chat and chien

Slide 6: Handling Punctuation and Capitalization

Our current implementation doesn't handle punctuation or preserve the original capitalization. Let's improve our translate_sentence function to address these issues.

import string

def translate_sentence(sentence, source_lang, target_lang):
    # Separate punctuation from words
    translator = str.maketrans('', '', string.punctuation)
    words = sentence.translate(translator).split()
    
    # Translate words and preserve original capitalization
    translated_words = []
    for word in words:
        translated = translate_word(word.lower(), source_lang, target_lang)
        if word.istitle():
            translated = translated.capitalize()
        translated_words.append(translated)
    
    # Rejoin words and add back punctuation
    translated_sentence = ' '.join(translated_words)
    for i, char in enumerate(sentence):
        if char in string.punctuation:
            translated_sentence = translated_sentence[:i] + char + translated_sentence[i:]
    
    return translated_sentence

# Test the improved sentence translation
print(translate_sentence("Hello, goodbye!", 'en', 'es'))  # Output: Hola, adiós!
print(translate_sentence("The Cat and dog.", 'en', 'fr'))  # Output: The Chat and chien.

Slide 7: Adding Language Detection

To make our translator more user-friendly, let's add a simple language detection feature. We'll create a function that attempts to identify the source language based on the words in the input text.

def detect_language(text):
    words = text.lower().split()
    lang_scores = {'en': 0, 'es': 0, 'fr': 0, 'de': 0}
    
    for word in words:
        for lang in translations:
            if word in translations[lang]:
                lang_scores[lang] += 1
    
    detected_lang = max(lang_scores, key=lang_scores.get)
    return detected_lang if lang_scores[detected_lang] > 0 else 'unknown'

# Test language detection
print(detect_language("Hello goodbye"))  # Output: en
print(detect_language("Hola adiós"))  # Output: es
print(detect_language("Bonjour chat"))  # Output: fr

Slide 8: Implementing a User Interface

Let's create a simple command-line interface for our translator. This will allow users to interact with our translator more easily.

def translator_interface():
    print("Welcome to the Python Translator!")
    while True:
        text = input("Enter text to translate (or 'q' to quit): ")
        if text.lower() == 'q':
            break
        
        source_lang = detect_language(text)
        if source_lang == 'unknown':
            source_lang = input("Unable to detect language. Please enter source language (en/es/fr/de): ")
        
        target_lang = input("Enter target language (en/es/fr/de): ")
        
        translated = translate_sentence(text, source_lang, target_lang)
        print(f"Translated text: {translated}\n")

# Run the interface
translator_interface()

Slide 9: Handling Unknown Words

Our current implementation simply returns the original word if a translation is not found. Let's improve this by adding a feature to mark unknown words and suggest possible translations based on similarity.

def levenshtein_distance(s1, s2):
    if len(s1) < len(s2):
        return levenshtein_distance(s2, s1)
    if len(s2) == 0:
        return len(s1)
    previous_row = range(len(s2) + 1)
    for i, c1 in enumerate(s1):
        current_row = [i + 1]
        for j, c2 in enumerate(s2):
            insertions = previous_row[j + 1] + 1
            deletions = current_row[j] + 1
            substitutions = previous_row[j] + (c1 != c2)
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_row
    return previous_row[-1]

def find_similar_word(word, lang):
    if lang not in translations:
        return None
    best_match = None
    min_distance = float('inf')
    for known_word in translations[lang]:
        distance = levenshtein_distance(word, known_word)
        if distance < min_distance:
            min_distance = distance
            best_match = known_word
    return best_match if min_distance <= 2 else None

def translate_word_improved(word, source_lang, target_lang):
    word_lower = word.lower()
    if source_lang in translations and word_lower in translations[source_lang]:
        return translations[source_lang][word_lower].get(target_lang, word)
    similar_word = find_similar_word(word_lower, source_lang)
    if similar_word:
        suggested = translations[source_lang][similar_word].get(target_lang, similar_word)
        return f"{word}({suggested}?)"
    return f"{word}(?)"

# Test the improved word translation
print(translate_word_improved('cat', 'en', 'es'))  # Output: gato
print(translate_word_improved('kat', 'en', 'es'))  # Output: kat(gato?)
print(translate_word_improved('dog', 'en', 'fr'))  # Output: chien
print(translate_word_improved('doog', 'en', 'fr'))  # Output: doog(chien?)
print(translate_word_improved('table', 'en', 'es'))  # Output: table(?)

Slide 10: Implementing Text-to-Speech (TTS) for Translated Text

To enhance our translator, let's add a text-to-speech feature that can pronounce the translated text. We'll use the built-in os module to interact with the system's text-to-speech capabilities.

import os

def speak_text(text, lang):
    # Map language codes to voice names (may vary depending on your system)
    voice_map = {
        'en': 'Alex',  # English
        'es': 'Juan',  # Spanish
        'fr': 'Thomas',  # French
        'de': 'Anna'  # German
    }
    
    voice = voice_map.get(lang, 'Alex')  # Default to English voice if not found
    
    # Use macOS 'say' command (adjust for other operating systems)
    os.system(f'say -v {voice} "{text}"')

# Modify the translator_interface function to include TTS
def translator_interface_with_tts():
    print("Welcome to the Python Translator with Text-to-Speech!")
    while True:
        text = input("Enter text to translate (or 'q' to quit): ")
        if text.lower() == 'q':
            break
        
        source_lang = detect_language(text)
        if source_lang == 'unknown':
            source_lang = input("Unable to detect language. Please enter source language (en/es/fr/de): ")
        
        target_lang = input("Enter target language (en/es/fr/de): ")
        
        translated = translate_sentence(text, source_lang, target_lang)
        print(f"Translated text: {translated}")
        
        speak_text(translated, target_lang)
        print()

# Run the interface with TTS
translator_interface_with_tts()

Slide 11: Real-Life Example: Travel Phrase Translator

Let's create a practical example of our translator by implementing a travel phrase translator. This will demonstrate how our translator can be used in real-world scenarios.

travel_phrases = {
    'en': {
        'where is': {'es': 'dónde está', 'fr': 'où est', 'de': 'wo ist'},
        'how much': {'es': 'cuánto cuesta', 'fr': 'combien coûte', 'de': 'wie viel kostet'},
        'bathroom': {'es': 'baño', 'fr': 'toilettes', 'de': 'toilette'},
        'restaurant': {'es': 'restaurante', 'fr': 'restaurant', 'de': 'restaurant'},
        'hotel': {'es': 'hotel', 'fr': 'hôtel', 'de': 'hotel'}
    }
}

def travel_translator(phrase, target_lang):
    words = phrase.lower().split()
    translated_words = []
    for word in words:
        if word in travel_phrases['en']:
            translated_words.append(travel_phrases['en'][word].get(target_lang, word))
        else:
            translated_words.append(translate_word(word, 'en', target_lang))
    return ' '.join(translated_words)

# Example usage
print(travel_translator("Where is the bathroom?", 'es'))  # Output: dónde está the baño?
print(travel_translator("How much is the hotel?", 'fr'))  # Output: combien coûte is the hôtel?
print(travel_translator("Where is a good restaurant?", 'de'))  # Output: wo ist a good restaurant?

Slide 12: Real-Life Example: Recipe Translator

Another practical application of our translator is a recipe translator. This example shows how we can adapt our translator for specific domains like cooking.

cooking_terms = {
    'en': {
        'bake': {'es': 'hornear', 'fr': 'cuire', 'de': 'backen'},
        'chop': {'es': 'picar', 'fr': 'hacher', 'de': 'hacken'},
        'boil': {'es': 'hervir', 'fr': 'bouillir', 'de': 'kochen'},
        'cup': {'es': 'taza', 'fr': 'tasse', 'de': 'tasse'},
        'tablespoon': {'es': 'cucharada', 'fr': 'cuillère à soupe', 'de': 'esslöffel'}
    }
}

def recipe_translator(recipe, target_lang):
    words = recipe.lower().split()
    translated_words = []
    for word in words:
        if word in cooking_terms['en']:
            translated_words.append(cooking_terms['en'][word].get(target_lang, word))
        else:
            translated_words.append(translate_word(word, 'en', target_lang))
    return ' '.join(translated_words)

# Example usage
recipe = "Chop the onions and boil them in 2 cups of water. Bake for 30 minutes."
print(recipe_translator(recipe, 'es'))
print(recipe_translator(recipe, 'fr'))

Slide 13: Handling Idiomatic Expressions

Idiomatic expressions pose a challenge for word-by-word translation. Let's implement a simple system to handle common idioms in our translator.

idioms = {
    'en': {
        'it\'s raining cats and dogs': {
            'es': 'está lloviendo a cántaros',
            'fr': 'il pleut des cordes',
            'de': 'es regnet in Strömen'
        },
        'break a leg': {
            'es': 'mucha mierda',
            'fr': 'merde',
            'de': 'hals- und beinbruch'
        }
    }
}

def translate_with_idioms(text, source_lang, target_lang):
    for idiom, translations in idioms.get(source_lang, {}).items():
        if idiom in text.lower():
            return text.lower().replace(idiom, translations.get(target_lang, idiom))
    return translate_sentence(text, source_lang, target_lang)

# Test the idiom translation
print(translate_with_idioms("It's raining cats and dogs", 'en', 'es'))
print(translate_with_idioms("Break a leg!", 'en', 'fr'))
print(translate_with_idioms("Hello, how are you?", 'en', 'de'))

Slide 14: Implementing Language-Specific Rules

Different languages have different grammatical rules. Let's implement a simple system to apply language-specific rules after translation.

def apply_language_rules(text, target_lang):
    if target_lang == 'es':
        # Spanish: Add inverted question mark at the beginning of questions
        if text.endswith('?'):
            text = '¿' + text
    elif target_lang == 'de':
        # German: Capitalize all nouns (simplified rule)
        words = text.split()
        for i, word in enumerate(words):
            if i > 0 and word not in ['der', 'die', 'das', 'ein', 'eine']:
                words[i] = word.capitalize()
        text = ' '.join(words)
    return text

def improved_translate(text, source_lang, target_lang):
    translated = translate_with_idioms(text, source_lang, target_lang)
    return apply_language_rules(translated, target_lang)

# Test the improved translation
print(improved_translate("Where is the bathroom?", 'en', 'es'))
print(improved_translate("The cat is on the table", 'en', 'de'))

Slide 15: Handling Context-Dependent Translations

Some words have multiple meanings depending on context. Let's implement a simple context-aware translation system.

context_translations = {
    'en': {
        'bank': {
            'financial': {'es': 'banco', 'fr': 'banque', 'de': 'Bank'},
            'river': {'es': 'orilla', 'fr': 'rive', 'de': 'Ufer'}
        }
    }
}

def translate_with_context(word, context, source_lang, target_lang):
    if word in context_translations.get(source_lang, {}):
        context_dict = context_translations[source_lang][word]
        for key, translations in context_dict.items():
            if key in context.lower():
                return translations.get(target_lang, word)
    return translate_word(word, source_lang, target_lang)

# Test context-aware translation
print(translate_with_context('bank', 'I need to go to the bank to withdraw money', 'en', 'es'))
print(translate_with_context('bank', 'We had a picnic by the river bank', 'en', 'fr'))

Slide 16: Additional Resources

For those interested in diving deeper into natural language processing and machine translation, here are some valuable resources:

  1. "Neural Machine Translation by Jointly Learning to Align and Translate" by Bahdanau et al. (2014) ArXiv: https://arxiv.org/abs/1409.0473
  2. "Attention Is All You Need" by Vaswani et al. (2017) ArXiv: https://arxiv.org/abs/1706.03762
  3. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al. (2018) ArXiv: https://arxiv.org/abs/1810.04805

These papers provide insights into advanced techniques used in modern machine translation systems, including neural networks and transformer architectures.