Skip to content

ADLS Gen 2#2635

Open
Skrypt wants to merge 34 commits into
Azure:mainfrom
Skrypt:jsavard/adls-gen2
Open

ADLS Gen 2#2635
Skrypt wants to merge 34 commits into
Azure:mainfrom
Skrypt:jsavard/adls-gen2

Conversation

@Skrypt
Copy link
Copy Markdown

@Skrypt Skrypt commented Mar 17, 2026

Adds ADLS Gen 2 feature parity to Azurite. See docs/design/ADLS-gen2-parity.md file for details.

This was created by Claude Sonnet 4.6. Just sharing my progress here.

See OrchardCMS/OrchardCore#19014 for some integration tests.
The Github CI over there uses the current latest Docker image but I've succesfully ran my unit tests using a local Docker container based on this PR.

Maybe we need to create more unit tests. I let you guys decide, but it is looking good for a start.

Adds ADSL Gen 2 feature parity to Azurite. See docs/design/ADLS-gen2-parity.md file for details.
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 17, 2026

@microsoft-github-policy-service agree

@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

hmm ok, just found this doc here: https://github.com/Azure/Azurite/wiki/ADLS-Gen2-Implementation-Guidance
Will push updates based on this. The only thing is that I'm pushing the whole thing as a single PR which will fail to get merged but at least we get something to work with.

Skrypt and others added 2 commits March 18, 2026 02:30
…DFS endpoint

Close compliance gaps with the Azurite ADLS Gen2 Implementation Guidance wiki:

- Add --enableHierarchicalNamespace CLI flag (default: true) so the emulator
  can run in FNS or HNS mode; wired through BlobEnvironment, Environment,
  VSCEnvironment, DfsServer, and FilesystemHandler.
- Replace non-atomic copy+delete rename with persistence-level renameBlob()
  and renameBlobsByPrefix() on both LokiJS and SQL stores, keeping directory
  renames atomic and avoiding unnecessary extent copies.
- Extract and forward If-Match, If-None-Match, If-Modified-Since, and
  If-Unmodified-Since conditional headers on DFS getProperties, read, and
  delete operations via ModifiedAccessConditions.
- Expose full blob lease lifecycle (acquire, release, renew, break, change)
  on the DFS endpoint by detecting x-ms-lease-action and delegating to the
  existing IBlobMetadataStore lease infrastructure.
- Add DFS-specific error helpers for conditional and lease failures.
- Extend dfsProxy.test.ts with tests for conditional headers, all lease
  operations, and atomic directory rename with children verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce the Swagger-generated DFS interface layer as called for by the
Azurite ADLS Gen2 Implementation Guidance wiki:

- swagger/dfs-storage-2023-11-03.json: OpenAPI 2.0 spec covering all DFS
  REST API operations (filesystem CRUD, path CRUD, read, update, lease).
- swagger/dfs.md: AutoRest configuration for DFS code generation.
- src/blob/generated-dfs/: Generated-style TypeScript artifacts mirroring
  the blob generated layer pattern (src/blob/generated/):
  - artifacts/operation.ts: DfsOperation enum (12 operations)
  - artifacts/models.ts: Typed request/response interfaces for all DFS ops
  - artifacts/specifications.ts: Dispatch specs for HTTP-to-operation matching
  - handlers/IFilesystemHandler.ts, IPathHandler.ts: Handler contracts
  - handlers/IHandlers.ts: Combined IDfsHandlers interface
  - handlers/handlerMappers.ts: Operation-to-handler routing map
  - Context.ts: DFS request context for the generated pattern
- package.json: Added build:autorest:dfs script
- DfsRequestListenerFactory.ts: Architecture docs referencing generated layer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts
Implement the dedicated hierarchy table required by the Azurite ADLS Gen2
Implementation Guidance wiki ("Add table matching each item with parent").

- IBlobMetadataStore: Add registerHnsPath, unregisterHnsPath,
  unregisterHnsPathsByPrefix, renameHnsPaths, isHnsDirectoryEmpty, and
  hnsPathExists methods for managing parent-child relationships.
- LokiBlobMetadataStore: New $HNS_HIERARCHY$ collection indexed on
  (accountName, containerName, path) and (parentPath) for fast lookups.
- SqlBlobMetadataStore: New HnsHierarchy table with unique path index and
  parentPath index; rename operations use transactions.
- PathHandler: create() registers paths in hierarchy; delete() uses
  isHnsDirectoryEmpty() for non-recursive guard and cleans up hierarchy
  records; renamePath() updates hierarchy atomically;
  ensureIntermediateDirectories() registers intermediate dirs.
- Tests: non-empty directory delete returns 409; recursive delete succeeds;
  auto-created intermediate directories are visible via HEAD.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native ADLS Gen2 (DFS) endpoint support to Azurite by standing up a first-class DFS server that shares the existing blob metadata/extent stores, plus a broad test suite and configuration/documentation updates.

Changes:

  • Introduces a native DFS Express pipeline (context/auth/dispatch + filesystem/path handlers) running on a new configurable DFS host/port.
  • Extends blob metadata stores with atomic rename operations to support DFS rename semantics (single-path + prefix/directory).
  • Adds DFS-focused integration tests and updates CLI/VSC/Docker/README to expose the DFS endpoint and HNS toggle.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tests/blob/dfsProxy.test.ts Adds DFS end-to-end tests for filesystem/path ops, append/flush, ACLs, leases, and rename.
src/common/VSCEnvironment.ts Adds dfsHost/dfsPort and enableHierarchicalNamespace to VS Code environment config.
src/common/Telemetry.ts Adds dfsHost/dfsPort to telemetry parameter collection.
src/common/Environment.ts Adds CLI flags for DFS host/port and hierarchical namespace toggle.
src/blob/utils/constants.ts Adds DFS defaults and changes HNS default constant behavior.
src/blob/persistence/SqlBlobMetadataStore.ts Implements SQL rename operations for DFS (single blob + prefix).
src/blob/persistence/LokiBlobMetadataStore.ts Implements Loki rename operations for DFS (single blob + prefix).
src/blob/persistence/IBlobMetadataStore.ts Extends store interface with rename methods used by DFS rename.
src/blob/main.ts Starts DFS server alongside standalone blob server and wires shared stores.
src/blob/dfs/handlers/PathHandler.ts Implements DFS path CRUD, list paths, append/flush, ACLs, leases, and rename.
src/blob/dfs/handlers/FilesystemHandler.ts Implements DFS filesystem CRUD/list and property setting via container operations.
src/blob/dfs/DfsPropertyEncoding.ts Adds shared base64 properties header encode/decode helper.
src/blob/dfs/DfsOperation.ts Defines DFS operation enum for routing/auth mapping.
src/blob/dfs/DfsErrorFactory.ts Adds DFS JSON error shaping (including special-casing HEAD).
src/blob/dfs/DfsContextFactory.ts Creates minimal blob Context objects for store calls from DFS handlers.
src/blob/dfs/DfsContext.ts Extracts DFS request context (account/filesystem/path) and performs version checks.
src/blob/dfs/DfsAuthenticationMiddleware.ts Adds DFS auth middleware reusing blob authenticators.
src/blob/SqlBlobServer.ts Exposes stores publicly to allow DFS to share them.
src/blob/IBlobEnvironment.ts Extends blob environment interface with DFS host/port + HNS toggle.
src/blob/DfsServer.ts Adds new DFS server implementation (HTTP/HTTPS) using shared stores.
src/blob/DfsRequestListenerFactory.ts Adds DFS Express pipeline and operation dispatch.
src/blob/DfsConfiguration.ts Adds DFS configuration wrapper based on existing ConfigurationBase.
src/blob/BlobServer.ts Exposes stores publicly to allow DFS to share them.
src/blob/BlobEnvironment.ts Adds blob-service CLI flags for DFS host/port and HNS toggle.
src/azurite.ts Starts DFS server alongside blob/queue/table in the main Azurite entrypoint.
package.json Adds VS Code extension config entries for dfsHost/dfsPort.
docs/designs/ADLS-gen2-parity.md Documents the ADLS Gen2 parity approach, phases, and design rationale.
README.md Documents DFS host/port usage and docker port exposure.
Dockerfile.Windows Exposes DFS port and starts Azurite with dfsHost binding.
Dockerfile Exposes DFS port and starts Azurite with dfsHost binding.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/dfs/handlers/FilesystemHandler.ts Outdated
Comment thread tests/blob/dfsProxy.test.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/persistence/SqlBlobMetadataStore.ts Outdated
Comment thread src/blob/dfs/DfsAuthenticationMiddleware.ts Outdated
Comment thread package.json Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/persistence/SqlBlobMetadataStore.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Skrypt and others added 2 commits March 18, 2026 03:12
Add comprehensive integration tests using the official Azure DataLake SDK
to validate Azurite DFS endpoint compatibility, as required by the wiki
("Pass all language SDK tests").

- Install @azure/storage-file-datalake@^12.29.0 as devDependency
- tests/blob/dfsSDKIntegration.test.ts: 22 tests across 6 categories:
  - Filesystem: create/delete, getProperties, listFileSystems
  - Directory: create/delete, nested dirs, move (rename)
  - File: create, append+flush+read, multi-chunk write, delete, move
  - ACL: setAccessControl/getAccessControl, setPermissions
  - List paths: recursive and non-recursive listing
  - Cross-API: DFS file readable via Blob API and vice versa

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement OAuth with ACL enforcement as specified by the Azurite ADLS Gen2
wiki Phase III ("OAuth: ACL works when user login with AAD account").

- OAuthLevel: Add ACL level (--oauth acl) alongside existing BASIC level.
- ConfigurationBase: Accept "acl" as --oauth parameter value.
- DfsContext: Add IDfsAuthenticatedIdentity interface with oid, upn, tid,
  appid fields; add identity field to IDfsContext.
- DfsAuthenticationMiddleware: Extract identity claims (oid, upn, tid,
  appid) from Bearer JWT tokens when ACL mode is active; store in DFS
  context for downstream enforcement.
- DfsAclEnforcer: New module implementing POSIX ACL evaluation:
  - Parses ACL strings ("user::rwx,user:oid:r-x,group::r-x,other::---")
  - Evaluates in POSIX order: owner -> named user -> group -> other
  - Applies mask entries to limit named user/group permissions
  - $superuser and unauthenticated requests bypass checks
  - Maps operations to required permissions (r/w/x)
- PathHandler: Add enforceAcl() helper; check ACL before getProperties,
  read, delete, and update operations; accept OAuthLevel from factory.
- DfsRequestListenerFactory: Pass OAuth level to PathHandler.
- Tests: 19 unit tests for ACL enforcer covering parsing, bypass scenarios,
  owner/named-user/group/other permissions, mask application, and UPN
  matching. All passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

I added new commits to comply to the https://github.com/Azure/Azurite/wiki/ADLS-Gen2-Implementation-Guidance.
It changed a lot of code. I would dismiss that copilot code review and restart a new one now.

The recursive delete was relying on the HNS hierarchy table to check for
children, which missed blobs created via the Blob API or before HNS
tracking was added. Now always uses listBlobs prefix scan (no delimiter)
to find ALL descendant blobs regardless of how they were created, then
deletes them all before removing the directory marker itself.

Fixes: DeleteDirectory_WithContents_DeletesAll test failure where
directoryClient.DeleteAsync(recursive: true) deleted the directory
marker but left child blobs intact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 18, 2026 07:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds native ADLS Gen2 (DFS) endpoint support to Azurite by introducing a DFS server pipeline that shares the Blob service’s metadata/extent/account stores, along with HNS directory semantics and initial ACL enforcement.

Changes:

  • Introduces a native DFS server (port 10004 by default) with filesystem/path handlers, auth middleware reuse, and DFS-specific context/error handling.
  • Extends metadata stores (Loki + SQL) to support rename operations and an HNS hierarchy registry.
  • Adds DFS-focused tests (proxy-style REST tests, SDK integration tests, and ACL enforcer unit tests) plus swagger artifacts/docs.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
tests/blob/dfsSDKIntegration.test.ts Adds JS SDK integration coverage via @azure/storage-file-datalake.
tests/blob/dfsProxy.test.ts Adds direct DFS REST-level tests and cross-API (Blob↔DFS) checks.
tests/blob/dfsAclEnforcer.test.ts Adds unit tests for ACL parsing/evaluation logic.
swagger/dfs.md Adds AutoRest configuration for DFS server generation.
swagger/dfs-storage-2023-11-03.json Adds DFS swagger spec used for generated artifacts.
src/common/models.ts Extends OAuth level enum with ACL enforcement mode.
src/common/VSCEnvironment.ts Adds VS Code settings for DFS host/port + HNS toggle.
src/common/Telemetry.ts Adds dfsHost/dfsPort telemetry parameter handling.
src/common/Environment.ts Adds CLI flags for dfsHost/dfsPort and HNS enablement.
src/common/ConfigurationBase.ts Parses new --oauth acl option.
src/blob/utils/constants.ts Adds DFS defaults + flips HNS “enabled” constant default.
src/blob/persistence/SqlBlobMetadataStore.ts Adds HNS hierarchy table + rename/HNS operations in SQL store.
src/blob/persistence/LokiBlobMetadataStore.ts Adds rename/HNS operations and HNS collection in Loki store.
src/blob/persistence/IBlobMetadataStore.ts Extends store interface with rename + HNS hierarchy methods.
src/blob/main.ts Starts DFS server alongside blob server in azurite-blob.
src/blob/generated-dfs/handlers/handlerMappers.ts Adds generated DFS handler mapper wiring.
src/blob/generated-dfs/handlers/IPathHandler.ts Adds generated DFS path handler interface.
src/blob/generated-dfs/handlers/IHandlers.ts Adds generated DFS handler registry interface.
src/blob/generated-dfs/handlers/IFilesystemHandler.ts Adds generated DFS filesystem handler interface.
src/blob/generated-dfs/artifacts/specifications.ts Adds operation matching specs for DFS routing.
src/blob/generated-dfs/artifacts/operation.ts Adds generated DFS operation enum.
src/blob/generated-dfs/artifacts/models.ts Adds generated DFS request/response model types.
src/blob/generated-dfs/Context.ts Adds DFS-specific generated context wrapper.
src/blob/dfs/handlers/PathHandler.ts Implements core DFS path operations (create/read/delete/list/update/lease/rename).
src/blob/dfs/handlers/FilesystemHandler.ts Implements DFS filesystem ops (create/delete/properties/list/setProperties).
src/blob/dfs/DfsPropertyEncoding.ts Adds helper for DFS x-ms-properties encoding/decoding.
src/blob/dfs/DfsOperation.ts Adds DFS operation enum for middleware/dispatch.
src/blob/dfs/DfsErrorFactory.ts Adds DFS JSON error writer + common DFS error types.
src/blob/dfs/DfsContextFactory.ts Adds helper to create minimal Blob Context for store calls.
src/blob/dfs/DfsContext.ts Adds DFS context middleware and URL parsing.
src/blob/dfs/DfsAuthenticationMiddleware.ts Reuses blob authenticators for DFS + extracts identity for ACL mode.
src/blob/dfs/DfsAclEnforcer.ts Adds ACL parsing and evaluation logic for DFS operations.
src/blob/SqlBlobServer.ts Exposes stores publicly for DFS server sharing.
src/blob/IBlobEnvironment.ts Extends blob environment interface with DFS host/port + HNS toggle.
src/blob/DfsServer.ts Adds DFS server wrapper around request listener pipeline.
src/blob/DfsRequestListenerFactory.ts Adds Express pipeline for DFS routing/auth/handlers.
src/blob/DfsConfiguration.ts Adds configuration class for DFS endpoint.
src/blob/BlobServer.ts Exposes stores publicly for DFS server sharing.
src/blob/BlobEnvironment.ts Adds CLI flags for DFS host/port and HNS toggle in azurite-blob.
src/azurite.ts Starts DFS server alongside blob/queue/table in the combined entrypoint.
package.json Adds @azure/storage-file-datalake dev dependency and DFS extension settings/script.
docs/designs/ADLS-gen2-parity.md Adds design/phase plan for DFS parity implementation.
README.md Documents DFS host/port config and Docker port exposure.
Dockerfile.Windows Exposes 10004 and passes --dfsHost.
Dockerfile Exposes 10004 and passes --dfsHost.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/dfs/handlers/FilesystemHandler.ts
Comment thread src/blob/DfsRequestListenerFactory.ts Outdated
Comment thread src/blob/persistence/LokiBlobMetadataStore.ts Outdated
Comment thread tests/blob/dfsProxy.test.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/utils/constants.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts
Azure ADLS Gen2 returns 409 when creating a directory that already exists.
The SDK's CreateIfNotExistsAsync relies on this to return null/false for
existing directories. Azurite was returning 201 regardless, causing
CreateDirectory_AlreadyExists_ReturnsFalse to fail.

Now checks if a directory blob with hdi_isfolder=true already exists
before creating, and returns 409 PathAlreadyExists if so. File creates
remain idempotent (overwrite) matching Azure behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

I've updated the unit tests on this PR: OrchardCMS/OrchardCore#19014
They run against a custom docker image that I created on my own Github account.

Guard against Express query params and request body being arrays
instead of strings. CodeQL flagged req.query.position and req.body
as potential vectors for type confusion in appendData and flushData.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 18, 2026 17:27
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native ADLS Gen2 (DFS) endpoint support to Azurite by introducing a dedicated DFS server/pipeline that shares the existing Blob metadata + extent stores, plus new persistence helpers (rename + HNS hierarchy) and a broad set of DFS-focused tests and swagger artifacts.

Changes:

  • Introduce a DFS server (port 10004 by default) with native Express handlers for filesystem/path operations, append/flush, leases, rename, and ACL metadata.
  • Extend metadata stores (Loki + SQL) with rename APIs and HNS hierarchy tracking.
  • Add DFS integration/proxy/unit tests, swagger inputs/config, and environment/Docker/README wiring for DFS.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tests/blob/dfsSDKIntegration.test.ts New JS SDK integration tests using @azure/storage-file-datalake.
tests/blob/dfsProxy.test.ts New REST-level DFS behavior tests (create/list/append/flush/rename/ACL/leases).
tests/blob/dfsAclEnforcer.test.ts Unit tests for ACL parsing/evaluation.
swagger/dfs.md AutoRest config for DFS codegen.
swagger/dfs-storage-2023-11-03.json DFS swagger used for codegen/artifacts.
src/common/models.ts Add OAuthLevel.ACL enum value.
src/common/VSCEnvironment.ts Add dfsHost/dfsPort + enableHierarchicalNamespace settings.
src/common/Telemetry.ts Include DFS flags in telemetry parameter list.
src/common/Environment.ts Add --dfsHost/--dfsPort and --enableHierarchicalNamespace CLI flags.
src/common/ConfigurationBase.ts Parse --oauth acl into OAuthLevel.ACL.
src/blob/utils/constants.ts Add DFS defaults; switch emulator HNS default to enabled.
src/blob/persistence/SqlBlobMetadataStore.ts Add HNS hierarchy table + rename helpers.
src/blob/persistence/LokiBlobMetadataStore.ts Add HNS hierarchy collection + rename helpers.
src/blob/persistence/IBlobMetadataStore.ts Add interfaces for rename and HNS hierarchy operations.
src/blob/main.ts Start DFS server alongside blob for azurite-blob.
src/blob/generated-dfs/** New generated DFS artifacts/interfaces/specs.
src/blob/dfs/** New DFS middleware, handlers, ACL enforcer, errors, property encoding.
src/blob/SqlBlobServer.ts Expose stores publicly for DFS server sharing.
src/blob/IBlobEnvironment.ts Add DFS + HNS configuration surface.
src/blob/DfsServer.ts New DFS server wrapper over ServerBase.
src/blob/DfsRequestListenerFactory.ts New DFS Express pipeline (context/dispatch/auth/routing/errors).
src/blob/DfsConfiguration.ts New DFS configuration class.
src/blob/BlobServer.ts Expose stores publicly for DFS server sharing.
src/blob/BlobEnvironment.ts Add DFS host/port + HNS flag for azurite-blob.
src/azurite.ts Start DFS server alongside blob/queue/table for azurite.
package.json Add DFS autorest script + @azure/storage-file-datalake dep + VS Code settings for DFS host/port.
docs/designs/ADLS-gen2-parity.md Design/phase plan for ADLS Gen2 parity implementation.
README.md Document DFS host/port + Docker port mapping updates.
Dockerfile.Windows Expose port 10004 and pass --dfsHost.
Dockerfile Expose port 10004 and pass --dfsHost.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/dfs/handlers/FilesystemHandler.ts
Comment thread src/azurite.ts Outdated
Comment thread src/azurite.ts Outdated
Comment thread package.json Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/DfsRequestListenerFactory.ts Outdated
Comment thread src/blob/persistence/SqlBlobMetadataStore.ts Outdated
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 18, 2026

I'm sorry, I probably don't have the permission to run agents on this repository.

@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented Mar 29, 2026

No feedback on this one yet after 2 weeks. I know that this is AI generated and this is probably not optimal but at least some feedback would be welcomed. Should I simply drop the PR and consider simply keeping this on my own fork? @sebastienros sorry to ping you on this one. Totally non-related with OC but what's your thoughts on this? This is an open sourced project by Microsoft on the Azure repository. Not sure who's managing this or if I will ever get feedback at this point.

Thanks.

Copilot AI review requested due to automatic review settings May 1, 2026 12:28
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/persistence/SqlBlobMetadataStore.ts
Comment thread src/blob/persistence/IBlobMetadataStore.ts Outdated
Comment thread tests/BlobTestServerFactory.ts
Skrypt and others added 2 commits May 1, 2026 08:44
4 new criticals, 7 majors, 5 minors identified. All pass-1 items marked fixed.
6 new test gaps recorded. Pass-2 items are pending.
…or handling

Criticals (azurite_hns_enabled integrity):
- FilesystemHandler.setProperties: read existing metadata first, preserve HNS flag
  and existing user metadata before applying x-ms-properties overlay (P2-C-1, P2-m-3)
- FilesystemHandler.extractMetadata: filter x-ms-meta-azurite_hns_enabled to block forgery (P2-C-4)
- ContainerHandler.setMetadata: preserve azurite_hns_enabled from existing metadata
  when Blob API SetContainerMetadata replaces the metadata map (P2-C-2)
- ContainerHandler.getContainerProperties: filter azurite_hns_enabled from user-visible
  metadata in Blob API response (P2-C-3)
- FilesystemHandler: add newEtag() import; use it in setProperties

Majors:
- listPaths: return 404 when specified directory does not exist (P2-M-1)
- listPaths: read stored dfsAclOwner/dfsAclGroup/dfsAclPermissions per entry
  instead of hardcoded $superuser defaults (P2-M-7)
- PathHandler.delete: handle 412 ConditionNotMet from conditional-header mismatch (P2-M-2)
- DfsContext: replace raw res.status(400).json() with sendDfsError so response
  includes x-ms-error-code header (P2-M-3)
- Multi-block read: call stream.destroy() on error to prevent resource leak (P2-M-4)
- breakLease: validate x-ms-lease-break-period for NaN; return 400 if invalid (P2-M-5)
- appendData: document TOCTOU limitation for concurrent appends (P2-M-6)

Minors:
- PathHandler: replace dynamic require("crypto") with top-level import (P2-m-1)
- PathHandler.create: rename inner parentPath to hnsParentPath to avoid shadowing (P2-m-2)
- DfsRequestListenerFactory: restrict resource=filesystem + ctx.path dispatch to GET only (P2-m-4)
- ensureIntermediateDirectories: throw 409 if a file occupies an intermediate path (P2-m-5)

Tests (4 new in dfsProxy.test.ts):
- HNS flag survives filesystem setProperties PATCH
- listPaths returns 404 for non-existent directory
- delete with non-matching If-Match returns 412
- listPaths returns stored ACL owner/group per path

55 DFS @loki tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 12:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/persistence/SqlBlobMetadataStore.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/dfs/handlers/FilesystemHandler.ts Outdated
Comment thread src/blob/dfs/handlers/FilesystemHandler.ts
Comment thread src/common/Environment.ts
Skrypt and others added 2 commits May 1, 2026 09:09
6 majors, 7 minors, 5 test gaps. All pass-2 items marked fixed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 1, 2026 13:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/blob/persistence/LokiBlobMetadataStore.ts
Comment thread src/blob/dfs/handlers/PathHandler.ts
Comment thread src/blob/dfs/handlers/FilesystemHandler.ts Outdated
Comment thread src/blob/DfsRequestListenerFactory.ts Outdated
Comment thread src/blob/persistence/SqlBlobMetadataStore.ts
Comment thread src/blob/persistence/SqlBlobMetadataStore.ts
…ead code

Majors:
- Recursive delete: pass lease conditions to child blob deletes so leased children
  are rejected rather than force-deleted (P3-M-1)
- listPaths: add eTag field to prefix (subdirectory) entries in non-recursive listing (P3-M-2)
- setAccessControlRecursive: only include root directory on first continuation page,
  not on every page (inflated directoriesSuccessful count) (P3-M-3)
- setAccessControlRecursive: validate mode param; return 400 InvalidQueryParameterValue
  for any value other than set/modify/remove (P3-M-4)
- read: return 400 PathIsDirectory when GET targets a directory blob (P3-M-5)
- renamePathAtomic: re-key uncommitted blocks staged under old path in both
  Loki (blocks collection) and SQL (BlocksModel) so flush after rename works (P3-M-6)

Minors:
- Document rename overwrite TOCTOU limitation (P3-m-1)
- Fix invalidFlushPosition() in DfsErrorFactory: status 400→409, parameterised
  message; replace inline error object in flushData with the factory call (P3-m-2)
- Single-block read: stream.pipe(res, { end: false }) to prevent double res.end() (P3-m-3)
- FilesystemHandler.list: clamp maxResults with || fallback to prevent NaN (P3-m-4)
- Delete dead DfsPropertyEncoding.ts (never imported) (P3-m-5)
- FilesystemHandler.create: replace custom hex ETag with newEtag() (P3-m-6)
- renamePath: reject rename source paths containing '..' segments (P3-m-7)

Tests (4 new):
- GET on directory path returns 400 PathIsDirectory
- setAccessControlRecursive with invalid mode returns 400
- listPaths non-recursive: subdirectory entries include eTag
- FilesystemHandler.list with non-numeric maxResults does not crash

59 DFS @loki tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gardenia
Copy link
Copy Markdown

gardenia commented May 20, 2026

@Skrypt Thanks for your work on this.

I am not involved with this project but I had occasion to need to do some local testing with DFS support so I tried out your PR. I was using the hadoop-azure (java) layer to interface with Azurite DFS which I ran into these issues (outlined below).

It could be I missed something since I am not familiar with Azuite but with the attached patch my use-case was working for me so I thought it was worth sharing.

I have attached the patch here:
azurite_dfs_patch.txt

Here is an (AI assisted) outline of the issues I ran into which the patch addresses:

Issue 1: Fatal TypeError during DFS Append
Location: src/blob/dfs/handlers/PathHandler.ts (Method: appendData)

  • Root Cause: The DFS implementation makes an unsafe assumption that req.body will always be a Buffer or a String. However, depending on the client’s headers (e.g., complex multi-part boundaries or zero-length probes) and
    the Express middleware configuration, req.body can be initialized as a plain JavaScript object ({}).
  • Failure Mode: When the unpatched code executes Buffer.from(req.body), Node.js throws: TypeError: The first argument must be of type string or an instance of Buffer... Received an instance of Object. This results in an
    immediate 500 Internal Server Error returned to the client.
  • The Fix: Implemented robust body extraction that explicitly checks the type of req.body. If it is not a Buffer, String, or Array, it is safely treated as an empty Buffer.alloc(0), preventing the runtime crash.

Issue 2: Date Hydration Crash after Emulator Restart
Location: src/blob/dfs/handlers/FilesystemHandler.ts and PathHandler.ts

  • Root Cause: Azurite uses LokiJS for persistence. When Azurite shuts down, it serializes its state to a JSON file, converting live JavaScript Date objects (like lastModified) into ISO-8601 strings. Upon restart, LokiJS
    loads these as Strings, not Date objects.
  • Failure Mode: The DFS handlers attempt to call result.properties.lastModified.toUTCString(). This succeeds if the server has not been restarted (where the value is still a Date), but fails immediately after a restart
    because Strings do not possess the .toUTCString() method. This results in a 500 Internal Server Error during HEAD or getAccessControl requests on existing resources.
  • The Fix: Wrapped all lastModified property access in a new Date(...) constructor. This ensures the value is defensively hydrated into a Date object regardless of whether it was retrieved as a live object (memory) or a
    serialized string (disk).

Issue 3: Illegal Buffer.from() on Non-String Metadata
Location: src/blob/dfs/handlers/FilesystemHandler.ts and PathHandler.ts

  • Root Cause: When constructing the x-ms-properties header, the DFS engine iterates through metadata and calls Buffer.from(value as string).toString("base64").
  • Failure Mode: Certain metadata properties—specifically internal flags like hdi_isfolder or properties set via the Blob API as Numbers/Booleans—are stored in the database as non-string types. Passing a Boolean or Number
    to Buffer.from() triggers a fatal TypeError in Node.js, crashing the request with a 500 Internal Server Error.
  • The Fix: Applied a String(value) wrapper to all metadata values during the header construction map. This ensures that booleans, numbers, and nulls are safely cast to strings before being passed to the Buffer encoder.

Summary of the Patch Set
The patches provide surgical, defensive guards against the strict type-safety requirements of the Node.js runtime. They align the newer DFS engine's robustness with the mature Blob API, ensuring that:

  1. Persistence is stable across restarts (Date hydration).
  2. Metadata is resilient to mixed-type injection (String wrapping).
  3. Request Bodies are handled gracefully across different client implementations (Robust append).

Verification
These fixes were validated using a dedicated integration test (tests/blob/dfsReproduction.test.ts) which explicitly triggers these conditions by injecting unhydrated strings and non-string metadata into the storage layer.

Wrap all lastModified values in new Date() before calling toUTCString()
so that string-typed values stored by LokiJS survive deserialization
without crashing. Coerce metadata values through String() before
base64-encoding to avoid Buffer type errors. Replace the fragile body
extraction in appendData with an explicit Buffer/string/array/object
branch so plain-object bodies (e.g. {}) are treated as empty rather
than causing a crash. Adds a regression test for both cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 04:02
@Skrypt
Copy link
Copy Markdown
Author

Skrypt commented May 21, 2026

Patch applied

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/blob/persistence/SqlBlobMetadataStore.ts:3618

  • conditionalPrefixReplaceExpr() builds a LIKE '${sourcePath}%' predicate using sourcePath directly. If sourcePath contains % or _, the CASE expression can match and rewrite unrelated rows (wildcard expansion), even if escapeLike() is fixed elsewhere. Ensure the LIKE pattern here escapes user-controlled wildcards consistently (and uses the same escaping strategy as the Op.like queries).

Comment thread src/blob/DfsRequestListenerFactory.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 04:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 3 comments.

Comment thread src/blob/persistence/IBlobMetadataStore.ts Outdated
Comment thread src/blob/dfs/handlers/PathHandler.ts Outdated
Comment thread src/blob/DfsRequestListenerFactory.ts Outdated
Skrypt and others added 2 commits May 21, 2026 00:39
The stale first block described a single-blob rename with wrong param
names; the second block is the accurate description. Drop the first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the sequential safeGetBlobProperties loop over returned
prefixes with a single parallel Promise.all, eliminating the N+1
round-trip pattern that could significantly slow non-recursive
directory listings at scale. Also applies the new Date() guard on
lastModified for consistency with the rest of the handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 04:40
/{fs}/{path}?resource=filesystem is not a valid DFS API URL shape.
Previously a GET on that pattern was silently routed to
Filesystem_ListPaths, producing a misleading 200. Collapse the two
ctx.path branches into one that always leaves operation undefined,
letting the dispatcher return 400 UnsupportedOperation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 56 changed files in this pull request and generated 3 comments.

Comment on lines +1207 to +1217
try {
await this.metadataStore.createBlob(createStorageContext(requestId), dirBlob);
// Register intermediate directory in HNS hierarchy
const parentDir = i > 1 ? parts.slice(0, i - 1).join("/") : null;
await this.metadataStore.registerHnsPath(
createStorageContext(requestId), account, filesystem,
dirPath, parentDir, true
);
} catch {
// Ignore if already exists (race condition)
}
Comment on lines +3706 to +3712
await BlobsModel.update(
{
containerName: destContainer,
blobName: this.prefixReplaceExpr("blobName", sourcePrefix, destPrefix),
lastModified: now,
etag: newEtag()
} as any,
Comment on lines 163 to 172
containerHandler: new ContainerHandler(
this.accountDataStore,
this.oauth,
this.metadataStore,
this.extentStore,
this.logger,
this.loose
this.loose,
this.disableProductStyle,
this.enableHierarchicalNamespace
),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants