Skip to content

HDDS-15024. Track pending containers in SCM to prevent Datanode over-allocation#10073

Merged
rakeshadr merged 8 commits intoapache:masterfrom
ashishkumar50:HDDS-15024
Apr 17, 2026
Merged

HDDS-15024. Track pending containers in SCM to prevent Datanode over-allocation#10073
rakeshadr merged 8 commits intoapache:masterfrom
ashishkumar50:HDDS-15024

Conversation

@ashishkumar50
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Introduce PendingContainerTracker in SCM to track container allocations that are issued but not yet fully realized on DataNodes.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15024

How was this patch tested?

Unit test

@adoroszlai adoroszlai changed the title HDDS-15024. Introduce PendingContainerTracker in SCM to prevent container over-allocation per DataNode. HDDS-15024. Track pending containers in SCM to prevent Datanode over-allocation Apr 13, 2026
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50 , thanks a lot for splitting the PR

  • For simplicity, let's don't remove buckets. The number of datanodes is small (~10k). We just keep all of them in the map.
  • Then, the map becomes very simple. Let's create a class.
  • Always roll before return.
  class DatanodeBuckets {
    private final ConcurrentHashMap<DatanodeID, TwoWindowBucket> map = new ConcurrentHashMap<>();

    TwoWindowBucket get(DatanodeID id) {
      final TwoWindowBucket bucket = map.compute(id, (k, b) -> b != null ? b : new TwoWindowBucket(rollIntervalMs));
      bucket.rollIfNeeded();
      return bucket;
    }

    TwoWindowBucket get(DatanodeDetails dn) {
      return map.get(dn.getID());
    }
  }

long usableSpace = VolumeUsage.getUsableSpace(report);
long containersOnThisDisk = usableSpace / containerSize;
effectiveAllocatableSpace += containersOnThisDisk * containerSize;
if (effectiveAllocatableSpace - pendingAllocationBytes >= containerSize) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use:

      if (usableSpace - pendingBytes >= containerSize) {

@ashishkumar50
Copy link
Copy Markdown
Contributor Author

@szetszwo thanks for the review, handled the comments.

@ashishkumar50 ashishkumar50 requested a review from szetszwo April 14, 2026 15:34
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50 , thanks for the update! The change looks mostly good.

  • All the synchronized (bucket) should be removed.
  • Are there legitimate cases to pass null to the methods and ignore it? If yes, please add a comment describing the cases. Otherwise, please replace the null check with Objects.requireNonNull(..).
    We usually use Objects.requireNonNull(..) to detect bugs. If we ignore and return, it hides the bug and may lead to more serious problem such as data loss.

@ashishkumar50 ashishkumar50 requested a review from szetszwo April 14, 2026 18:16
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashishkumar50 , thanks for the quick update!

+1 the change looks good.

Copy link
Copy Markdown
Contributor

@rakeshadr rakeshadr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ashishkumar50 for the continuous efforts. Added two log improvements, please take care.

+1 LGTM

@rakeshadr rakeshadr merged commit 95028e4 into apache:master Apr 17, 2026
45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants