Skip to content

Latest commit

 

History

History
71 lines (59 loc) · 3.84 KB

File metadata and controls

71 lines (59 loc) · 3.84 KB

AGENTS.md — azure-contentunderstanding

Package Overview

agent-framework-azure-contentunderstanding integrates Azure Content Understanding (CU) into the Agent Framework as a context provider. It automatically analyzes file attachments (documents, images, audio, video) and injects structured results into the LLM context.

Public API

Symbol Type Description
ContentUnderstandingContextProvider class Main context provider — extends ContextProvider
AnalysisSection enum Output section selector (MARKDOWN, FIELDS, etc.)
DocumentStatus enum Document lifecycle state (ANALYZING, UPLOADING, READY, FAILED)
FileSearchBackend ABC Abstract vector store file operations interface
FileSearchConfig dataclass Configuration for CU + vector store RAG mode

Architecture

  • _context_provider.py — Main provider implementation. Overrides before_run() to detect file attachments, call the CU API, manage session state with multi-document tracking, and auto-register retrieval tools for follow-up turns.
    • Analyzer auto-detection — When analyzer_id=None (default), _resolve_analyzer_id() selects the CU analyzer based on media type prefix: audio/prebuilt-audioSearch, video/prebuilt-videoSearch, everything else → prebuilt-documentSearch.
    • Multi-segment output — CU splits long video/audio into multiple scene segments (each a separate contents[] entry with its own startTimeMs, endTimeMs, markdown, and fields). _extract_sections() produces:
      • segments: list of per-segment dicts, each with markdown, fields, start_time_s, end_time_s
      • markdown: concatenated at top level with --- separators (for file_search uploads)
      • duration_seconds: computed from global min(startTimeMs)max(endTimeMs)
      • Metadata (kind, resolution): taken from the first segment
    • Speaker diarization (not identification) — CU transcripts label speakers as <Speaker 1>, <Speaker 2>, etc. CU does not identify speakers by name.
    • file_search RAG — When FileSearchConfig is provided, CU-extracted markdown is uploaded to an OpenAI vector store and a file_search tool is registered on the context instead of injecting the full document content. This enables token-efficient retrieval for large documents.
  • _models.pyAnalysisSection enum, DocumentStatus enum, DocumentEntry TypedDict, FileSearchConfig dataclass.
  • _file_search.pyFileSearchBackend ABC, OpenAIFileSearchBackend, FoundryFileSearchBackend.

Key Patterns

  • Follows the Azure AI Search context provider pattern (same lifecycle, config style).
  • Uses provider-scoped state dict for multi-document tracking across turns.
  • Auto-registers list_documents() tool via context.extend_tools().
  • Configurable timeout (max_wait) with asyncio.create_task() background fallback.
  • Strips supported binary attachments from input_messages to prevent LLM API errors.
  • Explicit analyzer_id always overrides auto-detection (user preference wins).
  • Vector store resources are cleaned up in close() / __aexit__.

Samples

Sample Description
01_document_qa.py Upload a PDF via URL, ask questions about it
02_multi_turn_session.py AgentSession persistence across turns
03_multimodal_chat.py PDF + audio + video parallel analysis
04_invoice_processing.py Structured field extraction with prebuilt-invoice analyzer
05_large_doc_file_search.py CU extraction + OpenAI vector store RAG
02-devui/01-multimodal_agent/ DevUI web UI for CU-powered chat
02-devui/02-file_search_agent/ DevUI web UI combining CU + file_search RAG

Running Tests

uv run poe test -P azure-contentunderstanding