UN-3414 [FIX] SharePoint OAuth connector creation and metadata handling by kirtimanmishrazipstack · Pull Request #1919 · Zipstack/unstract

kirtimanmishrazipstack · 2026-04-17T14:17:22Z

What

backend/connector_v2/views.py — preserve form-entered fields (site_url, drive_id) when OAuth metadata is fetched from cache; backfill missing connector_name from the connector's JSON schema default before serializer validation.
backend/connector_v2/serializers.py — stop overwriting per-connector form fields with ConnectorAuth.extra_data on OAuth refresh.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py — fix drive resolution by drive_id (bracket indexing on EntityCollection) and by full site URL (ctx.sites.get_by_url(...)).
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json — mark user_email as format: "password"; refine its description.

Why

OAuth tokens fetched from cache were overwriting per-connector form fields like site_url and drive_id, breaking SharePoint/OneDrive connection.
When multiple connectors share the same (provider, uid) OAuth identity, ConnectorAuth.extra_data was cross-contaminating per-connector metadata.
A production-build-only React race in the RJSF default-seeding introduced by UN-3398 [FIX] Pre-populate connector form with schema defaults #1913 occasionally drops connector_name from the POST body, causing connector_name: This field is required. 400s on staging for OAuth connectors (SharePoint, Google Drive). Local dev does not reproduce. Backend fallback unblocks staging without waiting for the frontend race to be isolated.
SharePoint drive lookup was failing for direct drive_id and for site-URL addressing against the Graph API.

How

_get_connector_metadata now merges form fields with OAuth tokens ({**form_metadata, **oauth_tokens}) instead of replacing.
create() calls a new _fill_default_connector_name before serializer.is_valid(...); the helper reads the connector's JSON schema via ConnectorProcessor.get_json_schema and injects properties.connectorName.default when the field is missing. Logs at INFO when it fires.
ConnectorInstanceSerializer.save no longer reassigns get_and_refresh_tokens() output back into connector_metadata — tokens still refresh, but the shared extra_data is not persisted per connector.
SharePoint _get_drive uses ctx.drives[self.drive_id]; _get_sharepoint_site_drive uses ctx.sites.get_by_url(self.site_url).drive.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No. Changes are scoped to the SharePoint/OneDrive OAuth path and to a backend fallback that only fires when the incoming connector_name is already empty. Non-SharePoint connectors and non-OAuth flows are untouched. The serializer change preserves token-refresh behavior while stopping cross-contamination between connectors that share the same (provider, uid).

Database Migrations

None.

Env Config

None.

Relevant Docs

Related Issues or PRs

Hardens the frontend default-seeding introduced in UN-3398 [FIX] Pre-populate connector form with schema defaults #1913.

Dependencies Versions

None.

Notes on Testing

SharePoint OAuth — site URL path: create connector with a full https://tenant.sharepoint.com/sites/<name> URL; verify drive loads and files list correctly.
SharePoint OAuth — direct drive_id: create connector with a known drive ID; verify listing works.
OAuth connector_name fallback: submit a connector create with connector_name omitted; verify 201 response and server log line Filled missing connector_name with schema default for <id>.
Non-regression: Google Drive OAuth, MinIO (non-OAuth), and existing SharePoint Client-Credentials flow continue to work.

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

coderabbitai · 2026-04-17T14:19:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Summary by CodeRabbit

New Features
- Connector name now auto-populates from schema defaults when omitted, with upfront validation.
Improvements
- OAuth token refreshes now merge only token fields back into existing connector metadata (preserving non-token fields).
- Cached OAuth tokens are merged with provided request metadata when present.
- More reliable SharePoint site/drive resolution and clearer user_email guidance for client‑credentials.
Bug Fixes
- Creation now fails early if schema defaults for connector name are absent.

Walkthrough

Connector metadata handling was changed to preserve non-token fields while merging refreshed OAuth token fields; connector_name can be backfilled from connector JSON schema defaults during validation; token-refresh APIs/signature updated; SharePoint drive resolution and client-credentials user_email schema text clarified.

Changes

Cohort / File(s)	Summary
Connector OAuth & metadata flow `backend/connector_v2/serializers.py`, `backend/connector_v2/views.py`, `backend/connector_v2/fields.py`, `backend/connector_auth_v2/models.py`, `backend/connector_auth_v2/constants.py`	Added `OAUTH_TOKEN_KEYS`; token-refresh functions now whitelist token fields and merge them into existing metadata instead of overwriting full connector metadata; views and serializers merge cached oauth token fragments with non-secret request metadata.
ConnectorInstance validation `backend/connector_v2/serializers.py`	`ConnectorInstanceSerializer` no longer requires `connector_name` in Meta; `validate()` backfills `connector_name` from connector JSON schema default and raises `ValidationError` if missing for non-partial requests.
SharePoint drive resolution & schema docs `unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py`, `.../sharepoint/static/json_schema.json`	Drive lookup changed to indexed access (`ctx.drives[...]`) and site resolution prefers `ctx.sites.get_by_url(...).drive` for URLs with paths; `user_email` JSON Schema description refined for client-credentials OneDrive vs SharePoint `site_url`.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(63, 81, 181, 0.5)
    participant Client
    end
    rect rgba(0, 150, 136, 0.5)
    participant View
    participant Serializer
    end
    rect rgba(255, 193, 7, 0.5)
    participant ConnectorAuthCache
    participant ConnectorAuthModel
    end
    rect rgba(233, 30, 99, 0.5)
    participant DB
    end

    Client->>View: POST/PUT with oauth_key and connector_metadata
    View->>ConnectorAuthCache: fetch cached oauth tokens (oauth_key)
    ConnectorAuthCache-->>View: oauth_tokens or MissingParam
    View->>View: merge oauth_tokens with non-secret request metadata
    View->>Serializer: initialize with merged metadata
    Serializer->>ConnectorAuthModel: _refresh_tokens(provider, uid, existing_metadata)
    ConnectorAuthModel->>ConnectorAuthModel: call external token refresh
    ConnectorAuthModel-->>Serializer: refreshed token subset (whitelisted)
    Serializer->>DB: save ConnectorInstance with merged metadata
    DB-->>Serializer: saved instance
    Serializer-->>Client: response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately summarizes the main changes: fixing SharePoint OAuth connector creation and metadata handling issues across multiple backend and connector modules.
Description check	✅ Passed	Description comprehensively covers all template sections including What, Why, How, breaking changes assessment, testing notes, and dependencies with clear explanations of the fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/sharepoint-connector-UI

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-04-20T10:33:35Z

Greptile Summary

This PR fixes several interconnected bugs in the SharePoint OAuth connector flow: OAuth token metadata was overwriting per-connector form fields (site_url, drive_id) during creation and background refreshes, and SharePoint drive/site resolution was using incorrect office365 library APIs. All three contamination paths (view-layer cache merge, serializer save(), DB-read auto-refresh in fields.py, and the background sibling-loop in models.py) are now guarded by a shared OAUTH_TOKEN_KEYS whitelist defined in connector_auth_v2/constants.py. The connector_name backfill was also moved into ConnectorInstanceSerializer.validate() (with a correct partial guard) to avoid the React race condition without mutating request.data.

Confidence Score: 5/5

Safe to merge — all previously raised P0/P1 concerns have been addressed and no new blocking issues found.

All four contamination paths (view-layer merge, serializer save, DB-read auto-refresh, background sibling-loop) are now protected by the shared OAUTH_TOKEN_KEYS whitelist. The connector_name backfill is correctly scoped with a partial guard. SharePoint API calls use the right library methods. No remaining P0 or P1 findings.

No files require special attention.

Important Files Changed

Filename	Overview
backend/connector_auth_v2/constants.py	Adds `OAUTH_TOKEN_KEYS` frozenset as a shared constant to define exactly which keys are safe to propagate across connectors sharing the same (provider, uid).
backend/connector_auth_v2/models.py	Sibling-loop in `get_and_refresh_tokens` now whitelist-merges only `OAUTH_TOKEN_KEYS` from `extra_data` instead of assigning the full dict, preventing form-field cross-contamination between connectors sharing the same OAuth identity.
backend/connector_v2/fields.py	`_refresh_tokens` now accepts `existing_metadata` and uses the `OAUTH_TOKEN_KEYS` whitelist before merging, so per-instance form fields (site_url, drive_id) are preserved on DB read-triggered auto-refresh.
backend/connector_v2/serializers.py	Adds `connector_name` backfill from JSON schema default in `validate()` with correct partial-update guard; `save()` uses whitelist-merge for refreshed tokens; `connector_name` marked `required=False` via `extra_kwargs`.
backend/connector_v2/views.py	`_get_connector_metadata` now merges form fields with OAuth tokens (`{form_metadata, oauth_tokens}`) instead of replacing; adds an `isinstance` guard for non-dict form values.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py	Fixes drive-by-ID lookup to use bracket indexing (`ctx.drives[drive_id]`) and SharePoint site resolution to use `ctx.sites.get_by_url(site_url).drive` instead of the broken `get_by_path` approach.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json	Updates `user_email` description to clarify it's only for OneDrive Client Credentials; marks the field as `format: "password"` (already present in the base, description now clearer).

Sequence Diagram

sequenceDiagram
    participant FE as Frontend
    participant View as ConnectorInstanceViewSet
    participant Ser as ConnectorInstanceSerializer
    participant Cache as OAuth Cache
    participant PSA as ConnectorAuth (PSA)
    participant DB as ConnectorInstance DB

    FE->>View: POST /connector/ with oauth_key param
    View->>Cache: get_oauth_creds_from_cache(oauth_key)
    Cache-->>View: oauth_tokens with access_token, uid, provider
    Note over View: form_metadata from request body merged with oauth_tokens
    View->>Ser: serializer.save(connector_metadata=merged)
    Note over Ser: validate() backfills connector_name from schema default
    Ser->>PSA: get_or_create_connector_auth(merged)
    PSA-->>Ser: connector_oauth instance
    Ser->>PSA: get_and_refresh_tokens()
    PSA-->>Ser: refreshed_metadata (full extra_data)
    Note over Ser: token_updates filtered by OAUTH_TOKEN_KEYS
    Ser->>DB: super().save(**kwargs)
    DB-->>FE: 201 Created

_{Reviews (5): Last reviewed commit: "greptile partial updatye" | Re-trigger Greptile}

greptile-apps · 2026-04-20T10:33:38Z

+                # Discard return value: ConnectorAuth.extra_data is shared across
+                # every connector with the same (provider, uid) and would overwrite
+                # this connector's form fields (site_url, drive_id).
+                connector_oauth.get_and_refresh_tokens()


Refreshed tokens silently dropped on update

get_and_refresh_tokens() can issue a token refresh and return updated access_token/refresh_token values. By discarding the return value entirely, those new tokens are never merged back into kwargs[CIKey.CONNECTOR_METADATA], so the ConnectorInstance ends up persisting the old (possibly just-invalidated) tokens. The next time the connector is used for a task, SharePointFS reads access_token straight from its stored settings dict and will fail with an expired credential.

A targeted merge — updating only the token-specific keys instead of the entire blob — preserves form fields (site_url, drive_id) and still keeps tokens current:

refreshed_data, refresh_status = connector_oauth.get_and_refresh_tokens() # Merge only OAuth token fields back; do not overwrite form-specific keys. _TOKEN_KEYS = {"access_token", "refresh_token", "expires_in", "token_type", "scope"} token_updates = {k: v for k, v in refreshed_data.items() if k in _TOKEN_KEYS} kwargs[CIKey.CONNECTOR_METADATA] = { **kwargs.get(CIKey.CONNECTOR_METADATA, {}), **token_updates, }

Prompt To Fix With AI

This is a comment left during a code review. Path: backend/connector_v2/serializers.py Line: 59 Comment: **Refreshed tokens silently dropped on update** `get_and_refresh_tokens()` can issue a token refresh and return updated `access_token`/`refresh_token` values. By discarding the return value entirely, those new tokens are never merged back into `kwargs[CIKey.CONNECTOR_METADATA]`, so the `ConnectorInstance` ends up persisting the old (possibly just-invalidated) tokens. The next time the connector is used for a task, `SharePointFS` reads `access_token` straight from its stored settings dict and will fail with an expired credential. A targeted merge — updating only the token-specific keys instead of the entire blob — preserves form fields (`site_url`, `drive_id`) and still keeps tokens current: ```python refreshed_data, refresh_status = connector_oauth.get_and_refresh_tokens() # Merge only OAuth token fields back; do not overwrite form-specific keys. _TOKEN_KEYS = {"access_token", "refresh_token", "expires_in", "token_type", "scope"} token_updates = {k: v for k, v in refreshed_data.items() if k in _TOKEN_KEYS} kwargs[CIKey.CONNECTOR_METADATA] = { **kwargs.get(CIKey.CONNECTOR_METADATA, {}), **token_updates, } ``` How can I resolve this? If you propose a fix, please make it concise.

Check this @kirtimanmishrazipstack , might affect an existing connector

Addressed — applied the whitelist-based merge in ConnectorInstanceSerializer.save(). Introduced _OAUTH_TOKEN_KEYS (access_token, refresh_token, token_type, expires, auth_time, refresh_after, expires_in, scope) so only token keys flow from the shared extra_data back into kwargs[CIKey.CONNECTOR_METADATA]. super().save(**kwargs) now persists the fresh token instead of the stale one from the request body, and per-connector form fields (site_url, drive_id) are preserved because they sit outside the whitelist.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/connector_v2/views.py`:
- Around line 101-103: When handling OAuth update/reauth flows in views.py the
code builds connector_metadata from only request data and oauth_tokens
(CIKey.CONNECTOR_METADATA -> form_metadata; connector_metadata =
{**form_metadata, **oauth_tokens}), which drops existing instance metadata like
site_url/drive_id when form_metadata is omitted; fix by seeding form_metadata
with the existing instance metadata when available (e.g. base_metadata =
(self.instance.connector_metadata or {}) then overlay with form_metadata and
finally oauth_tokens) so the merge preserves prior non-secret fields; apply the
same change to the other OAuth update block referenced around lines 164-175.
- Around line 108-135: The fallback in _fill_default_connector_name is currently
applied to all creates; restrict it to OAuth connector creation by adding an
early guard that returns unless the request is an OAuth create (e.g., check
request_data.get(CIKey.AUTH_TYPE) == AuthType.OAUTH or
request_data.get(CIKey.AUTH_TYPE) == "oauth"). Update the function to check
CIKey.AUTH_TYPE (and use AuthType.OAUTH constant) before fetching the
schema/details and only proceed to call ConnectorProcessor.get_json_schema and
set CIKey.CONNECTOR_NAME when the request is for an OAuth connector.

In
`@unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py`:
- Around line 139-141: The code sets self._drive using ctx.drives[self.drive_id]
and later accesses ctx.sites.get_by_url(self.site_url).drive but does not
execute the required query chain; update both locations to call
.get().execute_query() on the drive resource so the drive object is initialized
(e.g., replace ctx.drives[self.drive_id] and the .drive access from
ctx.sites.get_by_url(self.site_url) with the same .get().execute_query()
pattern) and assign the resulting loaded object to self._drive/drive as done
elsewhere in this module.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 981ec789-a892-48cf-8fdf-ab45b5cd9d44

📥 Commits

Reviewing files that changed from the base of the PR and between e114dcf and f0f2d5c.

📒 Files selected for processing (4)

backend/connector_v2/serializers.py
backend/connector_v2/views.py
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json

coderabbitai · 2026-04-20T10:38:17Z

+            # Preserve non-secret form fields (e.g. site_url connector Sharepoint)
+            form_metadata = self.request.data.get(CIKey.CONNECTOR_METADATA) or {}
+            connector_metadata = {**form_metadata, **oauth_tokens}


⚠️ Potential issue | 🟠 Major

Preserve existing instance metadata during OAuth update flows.

On update/reauth, if the request has oauth_key but omits connector_metadata, form_metadata becomes {} and the save path persists token-only metadata, dropping existing fields like site_url and drive_id.

🐛 Proposed fix

- def _get_connector_metadata(self, connector_id: str) -> dict[str, str] | None: + def _get_connector_metadata( + self, + connector_id: str, + base_metadata: dict[str, str] | None = None, + ) -> dict[str, str] | None: @@ - form_metadata = self.request.data.get(CIKey.CONNECTOR_METADATA) or {} + form_metadata = ( + self.request.data.get(CIKey.CONNECTOR_METADATA) + if CIKey.CONNECTOR_METADATA in self.request.data + else base_metadata + ) or {} connector_metadata = {**form_metadata, **oauth_tokens} @@ - connector_metadata = self._get_connector_metadata(connector_id) + connector_metadata = self._get_connector_metadata( + connector_id, + base_metadata=serializer.instance.connector_metadata, + )

Also applies to: 164-175

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/connector_v2/views.py` around lines 101 - 103, When handling OAuth update/reauth flows in views.py the code builds connector_metadata from only request data and oauth_tokens (CIKey.CONNECTOR_METADATA -> form_metadata; connector_metadata = {**form_metadata, **oauth_tokens}), which drops existing instance metadata like site_url/drive_id when form_metadata is omitted; fix by seeding form_metadata with the existing instance metadata when available (e.g. base_metadata = (self.instance.connector_metadata or {}) then overlay with form_metadata and finally oauth_tokens) so the merge preserves prior non-secret fields; apply the same change to the other OAuth update block referenced around lines 164-175.

The crash case (TypeError on non-dict metadata) is addressed via isinstance guard + json.loads recovery on the create path — see reply on Greptile G3. Seeding form_metadata from self.instance.connector_metadata on PATCH-with-metadata-omitted is a legitimate follow-up but outside this PR's scope (which is OAuth create + refresh persistence). Tracking separately.

@kirtimanmishrazipstack, understood — the isinstance guard in the current code covers the non-dict crash path, and the base_metadata seeding concern is a valid-but-separate gap. Would you like me to open a GitHub issue to track the follow-up (seeding form_metadata from serializer.instance.connector_metadata during OAuth update/reauth when connector_metadata is omitted from the request)?

coderabbitai · 2026-04-20T10:38:17Z

+    def _fill_default_connector_name(self, request_data: Any) -> None:
+        """Fill ``connector_name`` from the schema default when missing.
+
+        Defense-in-depth for an intermittent frontend race where the RJSF
+        default does not reach the POST body for OAuth connector creation,
+        causing the serializer to 400 on ``connector_name: required``.
+        """
+        if request_data.get(CIKey.CONNECTOR_NAME):
+            return
+        connector_id = request_data.get(CIKey.CONNECTOR_ID)
+        if not connector_id:
+            return
+        try:
+            schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id)
+        except InvalidConnectorID:
+            return
+        default_name = (
+            schema_details.get(ConnectorKeys.JSON_SCHEMA, {})
+            .get("properties", {})
+            .get("connectorName", {})
+            .get("default")
+        )
+        if not default_name:
+            return
+        request_data[CIKey.CONNECTOR_NAME] = default_name
+        logger.info(
+            "Filled missing connector_name with schema default for %s", connector_id
+        )


⚠️ Potential issue | 🟡 Minor

Scope the connector-name fallback to OAuth creates.

The fallback currently applies to every create request with a missing connector_name, which broadens behavior beyond the OAuth race described in the PR and can let non-OAuth/direct API creates bypass the required-name validation.

🛡️ Proposed fix

connector_id = request_data.get(CIKey.CONNECTOR_ID) if not connector_id: return + if not ( + self.request.query_params.get(ConnectorAuthKey.OAUTH_KEY) + and ConnectorInstance.supportsOAuth(connector_id=connector_id) + ): + return try: schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id)

Also applies to: 203-204

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/connector_v2/views.py` around lines 108 - 135, The fallback in _fill_default_connector_name is currently applied to all creates; restrict it to OAuth connector creation by adding an early guard that returns unless the request is an OAuth create (e.g., check request_data.get(CIKey.AUTH_TYPE) == AuthType.OAUTH or request_data.get(CIKey.AUTH_TYPE) == "oauth"). Update the function to check CIKey.AUTH_TYPE (and use AuthType.OAUTH constant) before fetching the schema/details and only proceed to call ConnectorProcessor.get_json_schema and set CIKey.CONNECTOR_NAME when the request is for an OAuth connector.

The helper itself has been removed per Chandrashekhar's suggestion — the fallback now lives in ConnectorInstanceSerializer.validate(). Intentionally applied to all connectors rather than gated on auth_type: the schema default is a valid fallback regardless of auth mode, and the model's blank=False still rejects connectors whose schema has no default, so no silent empty-name can slip through.

@kirtimanmishrazipstack, that makes sense — moving it to ConnectorInstanceSerializer.validate() is a cleaner home for this logic, and the blank=False model constraint is a solid safety net: if the schema has no default, the name stays empty and the model rejects it, so no silent empty-name connector can be persisted regardless of auth mode. Happy with this approach, thanks for the context!

coderabbitai · 2026-04-20T10:38:17Z

            if self.drive_id:
-                # Specific drive by ID
-                self._drive = ctx.drives.get_by_id(self.drive_id)
+                # Specific drive by ID — EntityCollection uses bracket indexing.
+                self._drive = ctx.drives[self.drive_id]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Description: # Find the pinned Office365-REST-Python-Client declaration and inspect whether # the pinned package source exposes the APIs used by this PR. # # Expected: # - Dependency declaration shows a version that includes: # - EntityCollection.__getitem__(str) # - SitesWithRoot.get_by_url(...) echo "Dependency declarations:" fd -i '^(pyproject\.toml|requirements.*|poetry\.lock|uv\.lock|Pipfile\.lock|setup\.py)$' -t f \ -x sh -c 'printf "\n--- %s ---\n" "$1"; rg -n "Office365-REST-Python-Client|office365-rest-python-client" "$1" || true' sh {} python - <<'PY' from pathlib import Path import re patterns = [ re.compile(r'Office365-REST-Python-Client[^0-9<>=!~]*([<>=!~]*\s*[0-9][^,\s"\']*)?', re.I), re.compile(r'office365-rest-python-client[^0-9<>=!~]*([<>=!~]*\s*[0-9][^,\s"\']*)?', re.I), ] for path in Path(".").rglob("*"): if path.name in {"pyproject.toml", "poetry.lock", "uv.lock", "Pipfile.lock", "setup.py"} or path.name.startswith("requirements"): try: text = path.read_text(errors="ignore") except Exception: continue for line_no, line in enumerate(text.splitlines(), 1): if any(p.search(line) for p in patterns): print(f"{path}:{line_no}: {line}") PY

Repository: Zipstack/unstract

Length of output: 4492

🌐 Web query:

Office365-REST-Python-Client version 2.6.0 release notes EntityCollection getitem

💡 Result:

Office365-REST-Python-Client version 2.6.0 was released on March 30, 2025. No detailed release notes or changelog are published on the GitHub releases page for this version; the tag simply notes "2.6.0 release" at commit c93a12c. The PyPI page confirms the release date but provides no specific changes. EntityCollection inherits from ClientObjectCollection (located in office365/runtime/client_object_collection.py), which supports getitem for indexed access to items. This enables lazy loading: accessing collection[i] loads the item on demand if not already fetched. Historical issues (e.g., #187 in 2020) with collection access beyond the initial batch (e.g., items[100]) were fixed in later versions, including improvements to lazy loading support starting from around v2.1.8. By v2.6.0, EntityCollection.getitem works reliably for accessing items by index after executing a query like ctx.load(collection); ctx.execute_query. No specific changes to EntityCollection.getitem are mentioned in available sources for v2.6.0. Example usage: items = ctx.web.lists.get_by_title('Documents').items ctx.load(items) ctx.execute_query item = items[0] # Triggers load if needed

Citations:

1: https://github.com/vgrem/Office365-REST-Python-Client/releases

2: https://pypi.python.org/project/Office365-REST-Python-Client/2.6.2/

3: https://github.com/vgrem/office365-rest-python-client/releases/tag/2.6.2

4: https://pypi.org/project/Office365-REST-Python-Client/

5: vgrem/office365-rest-python-client@d21fe7c

6: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/office365/runtime/client_object.py

7: Sharepoint List get_items() errors itterating past 100 items vgrem/office365-rest-python-client#187

8: https://github.com/vgrem/office365-rest-python-client/releases

9: https://pypi.org/project/Office365-REST-Python-Client/2.6.0/

10: https://github.com/vgrem/Office365-REST-Python-Client/tags

🌐 Web query:

Office365-REST-Python-Client 2.6.0 SitesWithRoot get_by_url API

💡 Result:

In Office365-REST-Python-Client version 2.6.0, SitesWithRoot is a class in the Microsoft Graph API client (GraphClient), defined as class SitesWithRoot(EntityCollection[Site]): """Sites container""". It supports the get_by_url method to retrieve a specific SharePoint site by its absolute URL. Usage example with GraphClient (Microsoft Graph v2 APIs for OneDrive/SharePoint): from office365.graph_client import GraphClient client = GraphClient(tenant="yourtenant.onmicrosoft.com") # Authenticate as shown in repo docs site_url = "https://yourtenant.sharepoint.com/sites/yoursite" site = client.sites.get_by_url(site_url) site.get.execute_query # Load site properties print(site.web_url) This matches Microsoft Graph Sites API: GET /sites/{site-id} or /sites/graph.microsoft.com/v1.0/sites/{hostname}:/{site-path}. For legacy SharePoint REST API (ClientContext), connect directly to the site URL: ctx = ClientContext(site_url).with_credentials(...); web = ctx.web.get.execute_query. SitesWithRoot likely accessible via client.sites (SitesWithRoot instance), as seen in issue discussions and examples like client.sites.get_by_url('https://...') for accessing lists/sites. Version 2.6.0 released 2025-03-30, supports Graph and SharePoint REST APIs per PyPI/GitHub.

Citations:

1: https://github.com/vgrem/office365-rest-python-client

2: https://pypi.org/project/Office365-REST-Python-Client/

3: https://github.com/vgrem/Office365-REST-Python-Client

4: Get all sites vgrem/office365-rest-python-client#301

5: I am stuggling to add items to a sharepoint list vgrem/office365-rest-python-client#914

6: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/examples/onedrive/folders/list_files.py

7: https://github.com/vgrem/office365-rest-python-client/tree/master/office365

8: How to navigate a site instead of root ? vgrem/office365-rest-python-client#280

9: https://stackoverflow.com/questions/72210088/how-to-programmatically-access-sharepoint-sites-other-than-root-from-office365-r

🏁 Script executed:

#!/bin/bash # Check if we can find any release notes or version info in the repository find . -name "CHANGELOG*" -o -name "HISTORY*" -o -name "RELEASE*" 2>/dev/null | head -20 # Also look for any version-specific documentation or comments in the codebase rg "2\.6\." --type-list | head -5 rg "2\.6\." . --type py -B2 -A2 | head -30

Repository: Zipstack/unstract

Length of output: 450

🌐 Web query:

Office365-REST-Python-Client EntityCollection string ID bracket indexing drives

💡 Result:

In the Office365-REST-Python-Client library, EntityCollection (from office365.entity_collection) is a collection of entities returned by queries like client.drives.get. It supports iteration after execute_query, as shown in the official example: from office365.graph_client import GraphClient client = GraphClient(tenant=tenant_name) drives = client.drives.get.execute_query for drive in drives: print(drive.web_url) No direct evidence found in searched sources for bracket indexing (e.g., drives['id']) on EntityCollection with string IDs. Collections support positional integer indexing for loaded items (drives[0]), but string ID access uses path notation like client.drives['drive-id'] before executing the query, which returns a single Drive object. To get a drive by string ID: drive = client.drives['your-drive-id'].get.execute_query EntityCollection does not appear to support direct string key lookup via getitem based on available source code snippets and examples. For large collections, load and filter or use get_by_id if available on the specific collection type.

Citations:

1: https://github.com/vgrem/office365-rest-python-client

2: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/office365/runtime/client_object.py

3: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/office365/graph_client.py

4: https://github.com/vgrem/office365-rest-python-client/tree/master/office365

🏁 Script executed:

#!/bin/bash # Check the actual usage context in the PR code cat -n unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py | sed -n '130,170p'

Repository: Zipstack/unstract

Length of output: 1955

🏁 Script executed:

#!/bin/bash # Search for execute_query calls and context around line 141 and 160 cat -n unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py | sed -n '134,180p' # Check for execute_query usage patterns in the file rg "execute_query|\.get" unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py -A2 -B2

Repository: Zipstack/unstract

Length of output: 3496

🌐 Web query:

Office365-REST-Python-Client GraphClient drives bracket notation get execute_query

💡 Result:

To use the Office365-REST-Python-Client library with GraphClient to access drives via bracket notation (e.g., client.drives['drive-id']) and execute a query, follow this pattern based on the library's design. The GraphClient provides access to drives as an EntityCollection: client.drives returns EntityCollection(self, Drive, ResourcePath("drives")). Bracket notation on collections supports retrieving a specific drive by ID, similar to how users are accessed (e.g., client.users["user-id"]). Example code to get a specific drive by ID: from office365.graph_client import GraphClient client = GraphClient(tenant="your-tenant.onmicrosoft.com").with_client_secret("client_id", "client_secret") drive_id = "b!your-drive-id" drive = client.drives[drive_id].get.execute_query print(drive.web_url) This corresponds to the Microsoft Graph API endpoint GET /drives/{drive-id}. For listing drives first (to get IDs): drives = client.drives.get.execute_query for d in drives: print(d.id, d.web_url) The .get method prepares a read query on the ClientObject, and execute_query submits it. No explicit "drives bracket notation get execute_query" example exists in docs, but this follows the established pattern from source code and examples like client.users["id"].drive.get.execute_query.

Citations:

1: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/examples/onedrive/drives/list.py

2: https://github.com/vgrem/office365-rest-python-client

3: https://learn.microsoft.com/en-us/graph/api/drive-get?view=graph-rest-1.0

4: https://github.com/vgrem/office365-rest-python-client/blob/master/examples/onedrive/drives/list.py

5: https://github.com/vgrem/Office365-REST-Python-Client

6: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/office365/graph_client.py

7: https://github.com/vgrem/Office365-REST-Python-Client/blob/master/office365/runtime/client_object.py

🌐 Web query:

Office365-REST-Python-Client Site drive property lazy loading

💡 Result:

The Office365-REST-Python-Client library supports lazy loading for collections and properties, including SharePoint site drive properties accessed via Microsoft Graph API. Collections like list items or drive items are implemented as lazy-load collections, meaning properties and items are loaded only when explicitly requested via ctx.load or .execute_query, rather than all at once. This avoids unnecessary API calls and improves performance. For a SharePoint site drive: - Use GraphClient to get the site: site = client.sites.get_by_url(site_url) - Lists on the site have a .drive property: site.lists["Documents"].drive - The drive object supports lazy loading; access properties like drive.root only after loading: site.lists["Documents"].drive.root.get_files.execute_query - Example from library issue: items = site.lists["Documents"].drive.root.get_files(True, 1000).execute_query demonstrates accessing site list drive with pagination support (fixed in v2.5.9+). No explicit "lazy loading" toggle for site.drive property; it's inherent to the client model's design where ctx.load or execute_query triggers actual API requests. Official docs are in GitHub repo examples/onedrive/sites.py and issues #79, #844.

Citations:

1: Getting more than 100 items from a list vgrem/office365-rest-python-client#79

2: Missing sharepoint drive items vgrem/office365-rest-python-client#844

3: https://github.com/vgrem/Office365-REST-Python-Client

4: https://github.com/vgrem/Office365-REST-Python-Client/tree/master/examples/onedrive

5: https://github.com/vgrem/office365-rest-python-client

6: Missing sharepoint drive items vgrem/office365-rest-python-client#844

7: https://github.com/vgrem/office365-rest-python-client/tree/master/examples

Add .get().execute_query() calls to initialize drive objects—the current code will fail at runtime.

Line 141 uses bracket notation correctly but is missing the required query execution chain. The correct pattern is ctx.drives[self.drive_id].get().execute_query(), not just ctx.drives[self.drive_id]. This applies similarly to line 160 with ctx.sites.get_by_url(self.site_url).drive—the .drive property access needs chaining with .get().execute_query() to actually load the drive. Both patterns require this chain based on Office365-REST-Python-Client v2.6.0 API semantics. The codebase already uses this pattern elsewhere (e.g., lines 207–209), so follow the same convention here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py` around lines 139 - 141, The code sets self._drive using ctx.drives[self.drive_id] and later accesses ctx.sites.get_by_url(self.site_url).drive but does not execute the required query chain; update both locations to call .get().execute_query() on the drive resource so the drive object is initialized (e.g., replace ctx.drives[self.drive_id] and the .drive access from ctx.sites.get_by_url(self.site_url) with the same .get().execute_query() pattern) and assign the resulting loaded object to self._drive/drive as done elsewhere in this module.

Not needed — the Office365 library resolves drive proxies lazily. execute_query() fires inside the first child operation (e.g. drive.root.get_by_path(path).children.get().execute_query() inside ls()). The pre-existing ctx.me.drive and ctx.users[email].drive paths in the same file work the same way without an explicit execute_query() on the drive itself. If staging surfaces a lazy-init issue we'll follow up.

@kirtimanmishrazipstack, that's a clear explanation — the lazy proxy resolution model means the drive object doesn't need its own execute_query() call since the chain fires when the first child operation (like drive.root.get_by_path(...).children.get().execute_query()) is invoked. The consistency with ctx.me.drive and ctx.users[email].drive in the same file confirms this is the correct pattern. I'll withdraw the concern.

📝 Learnings were identified and not saved (knowledge base disabled). Enable

DRF's request.data is an immutable QueryDict for multipart / form-encoded POSTs; directly mutating it inside _fill_default_connector_name would raise AttributeError for any non-JSON caller. Copy to a mutable dict once in create() and feed that copy to both the helper and the serializer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chandrasekharan-zipstack

Two pieces of feedback — see inline comments. The serializer-level fallback is a nice-to-have refactor; the incomplete-fix concern on the OAuth refresh path is the more important one.

chandrasekharan-zipstack · 2026-04-20T10:52:16Z

+        logger.info(
+            "Filled missing connector_name with schema default for %s", connector_id
+        )
+


Could this live on the serializer instead? Make connector_name non-required and inject the schema default in ConnectorInstanceSerializer.validate() using attrs['connector_id']. Benefits:

Keeps create() clean (no request.data.copy() + helper).

Fallback applies to any future endpoint using this serializer (bulk import, internal API, PATCH that clears the field).

Tradeoff: the view-level helper is more discoverable as a temporary defense-in-depth workaround. If this is meant to be removed once the frontend race is fixed, keep it here with a TODO; if it'll stick around, push it down to the serializer.

Moved. connector_name is now required=False via Meta.extra_kwargs, and the schema-default backfill lives in ConnectorInstanceSerializer.validate() using attrs['connector_id']. Removed the view-level helper, the data = request.data.copy() override in create(), and the associated imports (ConnectorProcessor, ConnectorKeys, InvalidConnectorID). Applies to PATCH/bulk callers as well, not just the POST path.

chandrasekharan-zipstack · 2026-04-20T10:52:16Z

+                # Discard return value: ConnectorAuth.extra_data is shared across
+                # every connector with the same (provider, uid) and would overwrite
+                # this connector's form fields (site_url, drive_id).
+                connector_oauth.get_and_refresh_tokens()


This fix only covers the create path. The same extra_data overwrite still happens in two other places this PR doesn't touch:

connector_auth_v2/models.py:97-100 — inside get_and_refresh_tokens, on every refresh it loops self.connectorinstance_set.all() and does connector_instance.connector_metadata = self.extra_data; save(). This overwrites every sibling connector's stored metadata (site_url, drive_id) with the shared extra_data. Same root cause as this bug, different trigger.

connector_v2/fields.py:31-41 — ConnectorAuthJSONField.from_db_value returns the refreshed extra_data directly when refresh_after has passed, dropping form fields on read.

So a SharePoint connector created cleanly with this PR will still lose site_url the first time a token refresh fires during normal usage (file listing, etc.), as long as another connector shares the same (provider, uid).

To actually hold, the same merge pattern ({**connector_instance.connector_metadata, **self.extra_data}) needs to be applied in those two places too. Otherwise existing connectors remain vulnerable, and even newly-created ones will regress on first refresh.

Done — applied the {**connector_instance.connector_metadata, **self.extra_data} pattern at both sites:

connector_auth_v2/models.py — sibling-loop in get_and_refresh_tokens

connector_v2/fields.py — from_db_value._refresh_tokens

Per-connector form fields (site_url, drive_id, tenant_id) now survive on both the save path (sibling-loop on token refresh) and the read path (from_db_value when refresh_after has elapsed).

chandrasekharan-zipstack · 2026-04-20T10:52:16Z

+            if oauth_tokens is None:
                raise MissingParamException(param=ConnectorAuthKey.OAUTH_KEY)
+            # Preserve non-secret form fields (e.g. site_url connector Sharepoint)
+            form_metadata = self.request.data.get(CIKey.CONNECTOR_METADATA) or {}


Minor: self.request.data.get(CIKey.CONNECTOR_METADATA) or {} — if the form sends metadata as a JSON string (depending on content-type / multipart shape), **form_metadata will TypeError. Worth confirming the request shape, or coercing with isinstance(form_metadata, dict).

Addressed together with Greptile G3 — json.loads recovers the dict when metadata arrives as a string (multipart/x-www-form-urlencoded), with an isinstance fallback for any other non-dict case.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/connector_auth_v2/models.py`:
- Around line 99-112: The merge in the refresh path is allowing non-token
enrichment keys from self.extra_data to leak into other ConnectorAuth siblings;
update the merge that sets connector_instance.connector_metadata to first filter
self.extra_data through the same whitelist used by serializers.py (the
_OAUTH_TOKEN_KEYS used to allow only token-related keys) before combining with
existing_metadata, or move _OAUTH_TOKEN_KEYS to a shared constant and reuse it
here; reference the connector_instance.connector_metadata assignment and
self.extra_data (and serializers._OAUTH_TOKEN_KEYS or
GoogleAuthHelper.enrich_connector_metadata as the provenance of enrichment) and
ensure only whitelisted token keys are propagated to sibling connector metadata.

In `@backend/connector_v2/fields.py`:
- Around line 31-45: The _refresh_tokens method currently merges
connector_auth.get_and_refresh_tokens() into existing_metadata unfiltered, which
can reintroduce enrichment keys from ConnectorAuth.extra_data; update
_refresh_tokens to whitelist only OAuth token-related keys (use the same
_OAUTH_TOKEN_KEYS constant used in serializers.py) by filtering
refreshed_metadata to contain only those keys before merging, then return
{**existing_metadata, **filtered_refreshed_metadata} so non-token enrichment
fields are not injected on read.

In `@backend/connector_v2/serializers.py`:
- Around line 109-124: connector_oauth.get_and_refresh_tokens() currently
iterates self.connectorinstance_set.all() and writes an unfiltered merge of
self.extra_data into each sibling connector, which lets non-token keys leak into
sibling connector_metadata; update that method so when it persists token data
back to sibling ConnectorInstance rows it only writes the whitelisted keys (use
the same _OAUTH_TOKEN_KEYS whitelist) and/or build a token_updates dict (like
the save() code does) and merge only those into each sibling's
CIKey.CONNECTOR_METADATA, explicitly excluding other extra_data keys before
saving.
- Around line 54-86: The validate method currently backfills connector_name from
the JSON schema but can return attrs without connector_name (causing a DB
IntegrityError later); update validate (in the serializer containing validate
and _get_schema_default_connector_name) so that after attempting to backfill via
_get_schema_default_connector_name (which calls
ConnectorProcessor.get_json_schema) it raises a serializers.ValidationError
referencing CIKey.CONNECTOR_NAME (or similar field name) when no connector_name
is present, preserving the API contract and producing a 400 instead of letting
ConnectorInstance.connector_name trigger a NOT NULL DB error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ae16b27d-bd5a-48a2-b609-34d30ea24dbe

📥 Commits

Reviewing files that changed from the base of the PR and between 2214d4b and 54415fe.

📒 Files selected for processing (4)

backend/connector_auth_v2/models.py
backend/connector_v2/fields.py
backend/connector_v2/serializers.py
backend/connector_v2/views.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/connector_v2/views.py

coderabbitai

🧹 Nitpick comments (1)

backend/connector_v2/serializers.py (1)

57-60: Run the schema default through the same connector-name validator.

The backfilled value is assigned after field-level validation, so it currently bypasses validate_connector_name(). Reuse that validator before storing the default to keep fallback behavior identical to normal submissions.

♻️ Proposed consistency fix

         if not default_name:
             raise ValidationError({CIKey.CONNECTOR_NAME: "This field is required."})
-        attrs[CIKey.CONNECTOR_NAME] = default_name
+        attrs[CIKey.CONNECTOR_NAME] = self.validate_connector_name(default_name)
         logger.info(

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/connector_v2/serializers.py` around lines 57 - 60, The default
connector name assigned to attrs[CIKey.CONNECTOR_NAME] bypasses the field-level
validator; call the existing validate_connector_name function on default_name
(and use/assign its validated/normalized return or let it raise ValidationError)
before setting attrs[CIKey.CONNECTOR_NAME] so fallback behavior matches normal
submissions; update the block around validate_connector_name, default_name and
attrs[CIKey.CONNECTOR_NAME] in serializers.py accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/connector_v2/serializers.py`:
- Around line 57-60: The default connector name assigned to
attrs[CIKey.CONNECTOR_NAME] bypasses the field-level validator; call the
existing validate_connector_name function on default_name (and use/assign its
validated/normalized return or let it raise ValidationError) before setting
attrs[CIKey.CONNECTOR_NAME] so fallback behavior matches normal submissions;
update the block around validate_connector_name, default_name and
attrs[CIKey.CONNECTOR_NAME] in serializers.py accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 81ef659e-1926-44b0-9c40-9d01ba5e9715

📥 Commits

Reviewing files that changed from the base of the PR and between 54415fe and e36b3d3.

📒 Files selected for processing (4)

backend/connector_auth_v2/constants.py
backend/connector_auth_v2/models.py
backend/connector_v2/fields.py
backend/connector_v2/serializers.py

✅ Files skipped from review due to trivial changes (1)

backend/connector_auth_v2/constants.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/connector_auth_v2/models.py
backend/connector_v2/fields.py

github-actions · 2026-04-20T15:52:04Z

Test Results

Summary

✅ Runner Tests: 11 passed, 0 failed (11 total)
✅ SDK1 Tests: 223 passed, 0 failed (223 total)

Runner Tests - Full Report

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

SDK1 Tests - Full Report

sonarqubecloud · 2026-04-20T15:52:18Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

🧹 Nitpick comments (1)

backend/connector_v2/serializers.py (1)
70-81: Consider broadening the exception catch in _get_schema_default_connector_name.

ConnectorProcessor.get_json_schema re-raises any exception from json.loads(...) on a malformed stored schema (see backend/connector_processor/connector_processor.py:61-85). Only InvalidConnectorID is caught here, so a corrupt JSON_SCHEMA row for a valid connector_id will surface as a 500 on every create — strictly worse than the pre-fix 400, and an operator typing a legitimate connector ID wouldn't get a useful response.

Two reasonable paths:

Log and return None on any non-InvalidConnectorID exception, letting the downstream ValidationError produce a 400 (and keeping the create flow usable for other connectors if one is misconfigured).

Leave as-is if you explicitly want malformed schemas to page loudly — in that case a logger.exception above the re-raise in get_json_schema is probably sufficient and this can be skipped.
♻️ Option: swallow and fall through to 400
-        try:
-            schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id)
-        except InvalidConnectorID:
-            return None
+        try:
+            schema_details = ConnectorProcessor.get_json_schema(connector_id=connector_id)
+        except InvalidConnectorID:
+            return None
+        except Exception:
+            logger.exception(
+                "Failed to load JSON schema for connector_id=%s while "
+                "backfilling connector_name",
+                connector_id,
+            )
+            return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/connector_v2/serializers.py` around lines 70 - 81, The
_get_schema_default_connector_name currently only catches InvalidConnectorID
from ConnectorProcessor.get_json_schema; broaden the exception handling to also
catch any other exceptions raised by get_json_schema (e.g., JSONDecodeError or
generic Exception), log the exception for operators, and return None so
malformed stored schemas don't raise a 500 during create; keep the existing
InvalidConnectorID behavior but ensure any non-InvalidConnectorID errors are
handled by logging (using the module logger) and falling through returning None
from _get_schema_default_connector_name.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/connector_v2/serializers.py`:
- Around line 70-81: The _get_schema_default_connector_name currently only
catches InvalidConnectorID from ConnectorProcessor.get_json_schema; broaden the
exception handling to also catch any other exceptions raised by get_json_schema
(e.g., JSONDecodeError or generic Exception), log the exception for operators,
and return None so malformed stored schemas don't raise a 500 during create;
keep the existing InvalidConnectorID behavior but ensure any
non-InvalidConnectorID errors are handled by logging (using the module logger)
and falling through returning None from _get_schema_default_connector_name.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 32939327-9660-48aa-baf1-3b8f12ef2b20

📥 Commits

Reviewing files that changed from the base of the PR and between e36b3d3 and e507469.

📒 Files selected for processing (1)

backend/connector_v2/serializers.py

kirtimanmishrazipstack added 2 commits April 17, 2026 16:26

sharepoint UI

ce620ed

sharepoint UI

0896b91

kirtimanmishrazipstack added 2 commits April 20, 2026 15:44

sharepoint connector_name fill

529ed59

sharepoint connector_name json schema

17fb607

kirtimanmishrazipstack changed the title ~~Fix/sharepoint connector UI~~ UN-3414 [FIX] SharePoint OAuth connector creation and metadata handling Apr 20, 2026

Merge branch 'main' into fix/sharepoint-connector-UI

f0f2d5c

kirtimanmishrazipstack marked this pull request as ready for review April 20, 2026 10:30

kirtimanmishrazipstack requested review from chandrasekharan-zipstack and muhammad-ali-e April 20, 2026 10:32

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/connector_v2/views.py

chandrasekharan-zipstack reviewed Apr 20, 2026

View reviewed changes

sharepoint connector_name json schema

54415fe

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/connector_auth_v2/models.py

Comment thread backend/connector_v2/fields.py Outdated

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/connector_auth_v2/models.py Outdated

Comment thread backend/connector_v2/fields.py Outdated

Comment thread backend/connector_v2/serializers.py

Comment thread backend/connector_v2/serializers.py

sharepoint connector_name json schema

e36b3d3

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

Comment thread backend/connector_v2/serializers.py

greptile partial updatye

e507469

coderabbitai bot reviewed Apr 20, 2026

View reviewed changes

Conversation

kirtimanmishrazipstack commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kirtimanmishrazipstack commented Apr 17, 2026 •

edited

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading

greptile-apps bot commented Apr 20, 2026 •

edited

Loading

coderabbitai bot Apr 20, 2026 •

edited

Loading

coderabbitai bot Apr 20, 2026 •

edited

Loading

coderabbitai bot Apr 20, 2026 •

edited

Loading