Skip to content

Some more small changes for GPU support#48

Merged
lkdvos merged 15 commits intolkdvos:mainfrom
kshyatt:ksh/gpu
Apr 24, 2026
Merged

Some more small changes for GPU support#48
lkdvos merged 15 commits intolkdvos:mainfrom
kshyatt:ksh/gpu

Conversation

@kshyatt
Copy link
Copy Markdown
Contributor

@kshyatt kshyatt commented Apr 23, 2026

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

❌ Patch coverage is 73.07692% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.27%. Comparing base (0bd8ffa) to head (f48fa24).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/auxiliary/blockarrays.jl 68.42% 6 Missing ⚠️
src/tensors/abstractblocktensor/abstractarray.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #48   +/-   ##
=======================================
  Coverage   56.27%   56.27%           
=======================================
  Files          19       20    +1     
  Lines        1427     1434    +7     
=======================================
+ Hits          803      807    +4     
- Misses        624      627    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kshyatt
Copy link
Copy Markdown
Contributor Author

kshyatt commented Apr 23, 2026

The failure here would be fixed in QuantumKitHub/TensorKit.jl#375, annoyingly circular...

Comment thread src/tensors/tensoroperations.jl Outdated
A::AbstractTensorMap, pA::Index2Tuple, conjA::Bool,
B::AbstractBlockTensorMap, pB::Index2Tuple, conjB::Bool,
pAB::Index2Tuple{N₁, N₂},
) where {N₁, N₂} = TO.tensorcontract_type(TC, B, pB, conjB, A, pA, conjA, pAB)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't entirely correct and only works since our implementation is not depending on too many details, but this specification of pB, pA, pAB is not necessarily a valid contraction specification, which I would like to keep.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll try to rework this then :)

Comment thread src/tensors/indexmanipulations.jl
Base.getindex(iter::TK.BlockIterator{<:AbstractBlockTensorMap}, c::Sector) = block(iter.t, c)

TensorKit.storagetype(::Type{TT}) where {TT <: AbstractBlockTensorMap} = storagetype(eltype(TT))
function TensorKit.storagetype(::Type{TT}) where {TT <: AbstractBlockTensorMap}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused about why we need this, isn't this already included in the base definition: https://github.com/QuantumKitHub/TensorKit.jl/blob/3dfc99f6ea79ba0fc90b959c76314d869a23a17b/src/tensors/abstracttensor.jl#L53-L59

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so? Or at least it was causing errors. The base definition captures defining:

const MyTensorMap = Union{TensorMap, SomeCrap}

But the BlockTensorMap has such a Union as its eltype, which is not captured by the method in TensorKit. So what happens is that logic in base TensorKit just returns the similarstoragetype(scalartype(T)) line, I think

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that storagetype(::Type{TT}) where {TT <: AbstractBlockTensorMap} = storagetype(eltype(TT)) should already work, which is what is confusing me, since if eltype(TT) isa Union, this should then be handled in the base definition. I might also be missing something though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try killing this here and seeing what happens

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this generates new problems because I think it is hitting the similarstoragetype(scalartype(T)) line, why it is doing so I'm not sure...

Comment thread ext/BlockTensorKitGPUArraysExt.jl Outdated
Comment on lines +12 to +21
function BlockTensorKit._full(A::BM) where {T <: Number, TA <: AnyGPUMatrix{T}, BM <: BlockMatrix{T, Matrix{TA}}}
arr = similar(first(A.blocks), size(A))
# TODO -- should we use Threads here to parallelize these
# transfers in streams if possible?
for block_index in Iterators.product(blockaxes(A)...)
indices = getindex.(axes(A), block_index)
arr[indices...] = @view A[block_index...]
end
return arr
end
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you feel about just putting this code in the main package, and calling this as well for the CPU arrays? I'm not entirely sure about the comment about Threads, which I would leave out for now, but otherwise this seems like we might as well just "vendor" this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have the CPU one as _full(A) = Array(A), which I think calls back into BlockArrays? I don't have a strong opinion about this so I can move it back over into src if that's your preference

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the Threads comment tbh because doing a series of small transfers like this is absolute Kryptonite to GPU performance

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the CPU side one falls back to a very similar function in BlockArrays.jl

@kshyatt kshyatt mentioned this pull request Apr 24, 2026
@lkdvos lkdvos force-pushed the ksh/gpu branch 2 times, most recently from ddb285b to 4eacaa1 Compare April 24, 2026 18:01
Copy link
Copy Markdown
Owner

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kshyatt sorry for hijacking your PR, going to try and include this and release already. We can rediscuss the tensorcontract_type implementation and the storagetype one separately maybe?

@lkdvos lkdvos enabled auto-merge (squash) April 24, 2026 18:09
@kshyatt
Copy link
Copy Markdown
Contributor Author

kshyatt commented Apr 24, 2026 via email

@lkdvos lkdvos merged commit cf8a145 into lkdvos:main Apr 24, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants