- Objective
- How It Works
- Pipeline Workflow
- Tech Stack
- Setup and Installation
- API Documentation
- Key Features
- Future Enhancements
- Contributions
Linguaflow is designed to provide seamless, high-performance multilingual translation capabilities using FastAPI and Hugging Face's MarianMT models. It enables developers to easily translate English text into multiple target languages, including French, Portuguese, and Spanish, with scalability and accuracy at its core.
The translation pipeline is straightforward yet powerful:
| Client Request | --> | FastAPI Endpoint (/translate) | --> | MarianMT Tokenizer | -->
| MarianMT Model (Generate Translation) | --> | Decode Translations | --> | Response to Client |
Below is a step-by-step workflow of the Linguaflow system:
1. Client Request:
|-- A user sends a JSON payload to the `/translate` endpoint.
|-- Example payload: {"src_text": [">>fra<< This is a test sentence."]}
2. Input Validation:
|-- FastAPI validates the request payload using Pydantic's `BaseModel`.
|-- Ensures the input follows the required schema.
3. Tokenization:
|-- The MarianTokenizer from Hugging Face processes the input text.
|-- Converts the text into token IDs compatible with the MarianMT model.
|-- Adds any necessary padding for batch processing.
4. Translation:
|-- The MarianMT model generates the translation in the specified target language.
|-- Utilizes the Transformer architecture's encoder-decoder mechanism for efficient and accurate results.
5. Decoding:
|-- The tokenized output from the model is decoded back into human-readable text.
|-- Special tokens are removed to produce clean translations.
6. Response:
|-- The translated text is encapsulated in a JSON response.
|-- Example response: {"translated_text": ["Ceci est une phrase de test."]}
Component | Technology | Description |
---|---|---|
Backend Framework | FastAPI | High-performance framework for building APIs with Python. |
Translation Model | MarianMT (Hugging Face) | Pre-trained transformer models optimized for multilingual translation. |
Data Validation | Pydantic | Ensures request payloads adhere to the expected schema. |
Server | Uvicorn | Lightning-fast ASGI server for running FastAPI apps. |
Libraries | Transformers, PyTorch | Core libraries for loading and fine-tuning transformer models. |
- Python Version: 3.8 or higher.
- Package Manager: pip (or Conda for virtual environments).
git clone <repository-url>
cd linguaflow
# On Windows
python -m venv venv
.\venv\Scripts\activate
# On macOS/Linux
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Navigate to http://127.0.0.1:8000 to access the root endpoint. Test the /translate
endpoint using tools like Postman, cURL, or Swagger UI (automatically available via FastAPI).
- URL:
/
- Method:
GET
- Description: Confirms that the API is running.
- Sample Response:
{ "message": "Welcome to the MarianMT Translation API!" }
-
URL:
/translate
-
Method:
POST
-
Description: Translates English text into a target language.
-
Request Payload:
{ "src_text": [">>fra<< This is a test sentence to translate to French."] }
- Language Prefixes:
>>fra<<
: French>>por<<
: Portuguese>>spa<<
: Spanish
- Language Prefixes:
-
Response:
{ "translated_text": ["Ceci est une phrase de test à traduire en français."] }
-
Multilingual Translation:
- Supports English to French, Portuguese, and Spanish translations.
- Easily extensible to other languages supported by MarianMT.
-
Asynchronous Processing:
- Handles concurrent requests efficiently, making it ideal for production-grade applications.
-
User-Friendly Testing:
- Swagger UI and JSON responses simplify API integration and debugging.
-
Error Handling:
- Built-in mechanisms for handling invalid inputs and unexpected exceptions.
-
Support for Additional Languages:
- Expand the API to include languages like German, Chinese, and Hindi.
-
Advanced Features:
- Context-based translation to preserve tone and sentiment.
- Batch processing for large-scale translation tasks.
-
Deployment to Cloud:
- Deploy on AWS or Google Cloud for global scalability.
Contributions are welcome! Feel free to fork the repository, make improvements, and submit a pull request.