Skip to content

feat: improve Exa search integration with tracking header and updated API#900

Open
tgonzalezc5 wants to merge 2 commits intomodelscope:mainfrom
tgonzalezc5:feat/improve-exa-search-integration
Open

feat: improve Exa search integration with tracking header and updated API#900
tgonzalezc5 wants to merge 2 commits intomodelscope:mainfrom
tgonzalezc5:feat/improve-exa-search-integration

Conversation

@tgonzalezc5
Copy link
Copy Markdown

Summary

  • Add x-exa-integration tracking header for API usage attribution (ms-agent)
  • Remove deprecated keyword search type; add current types: fast, deep-lite, deep, deep-reasoning, instant
  • Add content retrieval options: highlights and summary alongside existing text
  • Add filtering support: include_domains, exclude_domains, category, user_location
  • Fix to_list() result formatting to include text/highlights/summary content fields with graceful fallback when fields are absent
  • Add unit tests covering schema, search engine initialization, tool definition validation, and response parsing

Note on keyword search type: The keyword search type has been removed from the Exa API. This PR updates the integration to reflect the current API; agents passing type="keyword" should use type="fast" instead.

Usage Example

from ms_agent.tools.search.exa import ExaSearch

engine = ExaSearch()  # uses EXA_API_KEY env var
request = ExaSearch.build_request_from_args(
    query='latest advances in multi-agent systems',
    type='neural',
    num_results=5,
    highlights=True,
    summary=True,
    category='research paper',
    include_domains=['arxiv.org'],
    start_published_date='2024-01-01',
)
result = engine.search(request)
for item in result.to_list():
    print(item['title'], item.get('summary', ''))

Files Changed

  • ms_agent/tools/search/exa/search.py -- integration header, updated tool definition and request builder
  • ms_agent/tools/search/exa/schema.py -- new fields (highlights, summary, domain filters, category, user_location), improved to_list() with content fallback
  • tests/search/test_exa_search.py -- new test file with comprehensive coverage

Test Plan

  • ExaSearchRequest default values and to_dict() serialization
  • ExaSearchRequest with domain/category/location filters
  • ExaSearchResult.to_list() with text-only, highlights-only, summary-only, and all-content responses
  • ExaSearchResult.to_list() graceful fallback when content fields are absent
  • ExaSearchResult.to_list() with empty response
  • ExaSearch sets x-exa-integration: ms-agent header on initialization
  • ExaSearch raises AssertionError when EXA_API_KEY is not set
  • ExaSearch.get_tool_definition() does not include keyword type
  • ExaSearch.get_tool_definition() exposes all filter parameters
  • ExaSearch.build_request_from_args() correctly maps all parameters
  • Realistic API response fixture parsing

… API support

- Add x-exa-integration tracking header for API usage attribution
- Remove deprecated 'keyword' search type; add 'fast', 'deep-lite',
  'deep', 'deep-reasoning', and 'instant' types per current API
- Add highlights and summary content retrieval options
- Add domain filtering (include_domains, exclude_domains), category
  filter, and user_location support
- Fix to_list() to include text/highlights/summary content fields
  with graceful fallback when fields are absent
- Add comprehensive unit tests for schema, search engine, and
  response parsing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Exa search tool by integrating advanced features such as highlights, summaries, domain filtering, and expanded search modes. The changes include updates to the request schema, tool definitions, and result processing logic to support these new parameters, along with the addition of comprehensive unit tests. Review feedback identifies opportunities to improve the consistency of optional field handling in the to_dict method and to refine type hints for better type safety in result lists.

Comment on lines 53 to +78
def to_dict(self) -> Dict[str, Any]:
"""
Convert the request parameters to a dictionary.
Convert the request parameters to a dictionary suitable for
exa-py's search_and_contents() call.
"""
return {
d: Dict[str, Any] = {
'query': self.query,
'text': self.text,
'highlights': self.highlights,
'summary': self.summary,
'type': self.type,
'num_results': self.num_results,
'start_published_date': self.start_published_date,
'end_published_date': self.end_published_date,
'start_crawl_date': self.start_crawl_date,
'end_crawl_date': self.end_crawl_date
'end_crawl_date': self.end_crawl_date,
}
if self.include_domains:
d['include_domains'] = self.include_domains
if self.exclude_domains:
d['exclude_domains'] = self.exclude_domains
if self.category:
d['category'] = self.category
if self.user_location:
d['user_location'] = self.user_location
return d
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The to_dict method handles optional parameters inconsistently. Date-related fields (lines 65-68) are included in the dictionary even when they are None, while domain and category filters (lines 70-77) are conditionally added. It is cleaner and more robust for API compatibility to omit all optional fields when they do not have a value.

    def to_dict(self) -> Dict[str, Any]:
        """
        Convert the request parameters to a dictionary suitable for
        exa-py's search_and_contents() call.
        """
        d: Dict[str, Any] = {
            'query': self.query,
            'text': self.text,
            'highlights': self.highlights,
            'summary': self.summary,
            'type': self.type,
            'num_results': self.num_results,
        }
        # Add optional filters only if they have values
        for field_name in [
                'start_published_date', 'end_published_date',
                'start_crawl_date', 'end_crawl_date', 'include_domains',
                'exclude_domains', 'category', 'user_location'
        ]:
            value = getattr(self, field_name)
            if value:
                d[field_name] = value
        return d

@@ -81,57 +110,35 @@ def to_list(self):

res_list: List[Any] = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint List[Any] is too broad. Since the method returns a list of dictionaries representing search results, List[Dict[str, Any]] is more accurate and provides better type safety.

Suggested change
res_list: List[Any] = []
res_list: List[Dict[str, Any]] = []

…pe hint

Address review feedback:
- Omit all optional fields from to_dict() when None, not just domain/category
  filters; avoids sending null values to the API
- Narrow to_list() return type from List[Any] to List[Dict[str, Any]]
- Add date field assertions to test_to_dict_basic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant