Name	Name	Last commit message	Last commit date
parent directory ..
.github/workflows	.github/workflows
artifacts	artifacts
configs	configs
data	data
docs	docs
reports	reports
scripts	scripts
src	src
tests	tests
.gitignore	.gitignore
.lightningignore	.lightningignore
README.md	README.md
lightning_cloud_app.py	lightning_cloud_app.py
requirements-data.txt	requirements-data.txt
requirements.txt	requirements.txt

Train-Once Quant Trading Platform

One-time GPU training pipeline that produces a frozen LoRA adapter and deploys that same artifact everywhere: backtests, paper trading, live bot, and the research API. No retraining unless explicitly triggered.

Architecture (Text Diagram)

          +-----------------------------+
          |  Data Ingestion + FeatureStore|
          |  (OHLCV, fundamentals, news,  |
          |   transcripts, macro, social) |
          +------------------+------------+
                             |
                             v
                 +------------------------+
                 | Training Corpus Builder|
                 | - Tabular (parquet)    |
                 | - Text (jsonl prompts) |
                 +-----------+------------+
                             |
             +---------------+---------------+
             |                               |
             v                               v
    +----------------------+        +----------------------+
    | Tabular Model (CPU) |        | LoRA LLM (GPU)       |
    | XGBoost + Optuna     |        | L40S or A100          |
    +-----------+----------+        +-----------+----------+
                |                               |
                +---------------+---------------+
                                v
                     +----------------------+
                     | Ensemble Scoring     |
                     +----------+-----------+
                                |
          +---------------------+---------------------+
          |                     |                     |
          v                     v                     v
    Backtester (CPU)      Paper Bot (CPU)      FastAPI (CPU)

Key Principles

Train ONCE on GPU, save a LoRA adapter artifact.
All downstream uses load the same frozen adapter and tabular model.
No GPU usage after training. Inference uses CPU or NVIDIA NIM.
Backtests are the source of truth. No alpha claims without metrics.

Data Sources (Free/Low-Cost First)

OHLCV: yfinance (daily + limited intraday), optional Polygon/Alpha Vantage.
Fundamentals: yfinance metadata + SEC Financial Statement Data Sets (FSDS).
FSDS: bulk quarterly 10-Q/10-K numeric data from SEC (10GB+ when spanning 2010-2024).
News: NewsAPI (free tier), Yahoo RSS.
Transcripts: local JSON or SEC 8-K ingestion (stubbed).
Macro: FRED API + yfinance (VIX, DXY, gold, crude).
Sentiment: placeholders for Reddit/Twitter/short interest.

GPU Cost Estimate

Default GPU: NVIDIA L40S (lower cost, strong throughput).
Optional: A100-40GB if you want extra headroom.
Training budget target: 4 hours.

Repo Structure

src/
  data/
  training/
  model/
  backtest/
  bot/
  api/
  monitoring/
configs/
artifacts/
data/
reports/
.github/workflows/

Setup (GitHub Actions)

Create repo secrets:
- MODAL_TOKEN_ID, MODAL_TOKEN_SECRET
- NVIDIA_NIM_API_KEY
- SEC_USER_AGENT (required for SEC downloads)
- Optional: NEWSAPI_KEY, FRED_API_KEY, WANDB_API_KEY
Trigger training workflow:
- GitHub Actions → Train LoRA (Modal GPU). 2.5. The training workflow uploads artifacts.tar.gz. Extract it into artifacts/ before CPU runs.
Run CPU backtest:
- GitHub Actions → Backtest CPU.

One Worked Example (AAPL, MSFT, NVDA, TSLA, SPY)

The default configs already target these tickers for 2023-2024 test period.

python -m src.cli build-data --config configs/data.yaml
python -m src.cli build-corpus --config configs/data.yaml
python -m src.training.train_lora --config configs/training.yaml
python -m src.backtest.engine --config configs/backtest.yaml

In GitHub Actions, run Backtest CPU after training completes.

Known Limitations

Free APIs have rate limits and partial history for intraday data.
Full multimodal corpus at scale needs substantial CPU memory for preprocessing.
Transcript/news ingestion is stubbed unless you provide sources.
NIM inference requires a valid API key and model access.

What Runs Where

GPU: src/training/train_lora.py via Modal (L40S or A100).
CPU: feature engineering, corpus build, backtests, bot, API.

Artifacts

artifacts/tabular_model.ubj
artifacts/lora_adapter/
artifacts/training_metadata.json
reports/backtest/*

TradingView Data

TradingView proprietary data is not publicly accessible. The pipeline includes TradingView-equivalent indicators and can ingest TradingView CSV exports if you provide TRADINGVIEW_CSV_PATH or TRADINGVIEW_CSV_URL as a secret/environment variable.

Build Large Corpus

Use the Build Large Corpus (Modal CPU) workflow to generate a large dataset in the Modal volume. It outputs a small corpus_summary.json artifact with row counts and sizes.

Lightning Auto-Resume Runs

If you want Lightning.ai to survive interruptions without trying to chain free interactive sessions forever, use the included Lightning run workflow:

Configure lightning_run.yaml
Add GitHub secrets:
- LIGHTNING_USERNAME
- LIGHTNING_API_KEY
Launch Launch Lightning Auto-Resume Run
Let Lightning Progress Snapshot archive status and checkpoint manifests every 4 hours

Details: lightning_autoresume.md

GitHub CPU Chunking (No Modal CPU)

If you want to avoid Modal for CPU, use the Build Corpus Chunk (GitHub CPU) workflow. It writes each chunk to external S3-compatible storage (OCI Object Storage works) and is limited by GitHub’s 6-hour runner cap, so keep chunk_size small.

OCI CPU (Time-Boxed VM)

If you prefer OCI CPU, use the Launch OCI CPU VM (Time-Boxed) workflow. It launches a VM with a strict auto-shutdown window and provides an instance id artifact. Terminate any VM with Terminate OCI VM.

Required GitHub secrets:

OCI_TENANCY_OCID
OCI_USER_OCID
OCI_FINGERPRINT
OCI_REGION
OCI_PRIVATE_KEY
OCI_AD
OCI_COMPARTMENT_OCID
OCI_SUBNET_OCID
OCI_IMAGE_OCID

External Checkpointing (Switch Modal Accounts Safely)

If you might switch Modal accounts mid-run, enable external checkpointing to S3-compatible storage. Each chunk uploads to a bucket so a new Modal account can continue and the merge step can download chunks.

Add these GitHub Secrets (optional):

CHECKPOINT_S3_BUCKET
CHECKPOINT_S3_ACCESS_KEY
CHECKPOINT_S3_SECRET_KEY
CHECKPOINT_S3_REGION (default us-east-1 if omitted)
CHECKPOINT_S3_ENDPOINT (for R2/MinIO)
CHECKPOINT_S3_PREFIX (default train-once)
CHECKPOINT_S3_USE_PATH_STYLE (true for MinIO)

Open-Source Pine Script References

See docs/pine_sources.md for open-source Pine Script indicator references and licenses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Train-Once Quant Trading Platform

Architecture (Text Diagram)

Key Principles

Data Sources (Free/Low-Cost First)

GPU Cost Estimate

Repo Structure

Setup (GitHub Actions)

One Worked Example (AAPL, MSFT, NVDA, TSLA, SPY)

Known Limitations

What Runs Where

Artifacts

TradingView Data

Build Large Corpus

Lightning Auto-Resume Runs

GitHub CPU Chunking (No Modal CPU)

OCI CPU (Time-Boxed VM)

External Checkpointing (Switch Modal Accounts Safely)

Open-Source Pine Script References

FilesExpand file tree

quant_platform

Directory actions

More options

Directory actions

More options

Latest commit

History

quant_platform

Folders and files

parent directory

README.md

Train-Once Quant Trading Platform

Architecture (Text Diagram)

Key Principles

Data Sources (Free/Low-Cost First)

GPU Cost Estimate

Repo Structure

Setup (GitHub Actions)

One Worked Example (AAPL, MSFT, NVDA, TSLA, SPY)

Known Limitations

What Runs Where

Artifacts

TradingView Data

Build Large Corpus

Lightning Auto-Resume Runs

GitHub CPU Chunking (No Modal CPU)

OCI CPU (Time-Boxed VM)

External Checkpointing (Switch Modal Accounts Safely)

Open-Source Pine Script References