Skip to content

Excessive memory consumption when extracting non-consecutive indices with selected_ranges() #881

@LTLA

Description

@LTLA

Originally reported in LTLA/TileDBArray#35. Simplified example is:

library(tiledb)

set.seed(111)
path <- "mock"
dom <- tiledb_domain(dims = list(
    tiledb_dim("d1", c(1L, 20000L), 20000L, type = "INT32"),
    tiledb_dim("d2", c(1L, 15000L), 15000L, type = "INT32")
))
schema <- tiledb_array_schema(
    dom,
    attrs = list(tiledb_attr("x", type = "FLOAT64")),
    sparse = TRUE
)
tiledb_array_create(path, schema)

smat <- Matrix::rsparsematrix(20000, 15000, density=0.1, repr="T")
arr <- tiledb_array(path, query_type = "WRITE")
arr[] <- data.frame(d1 = smat@i + 1L, d2 = smat@j + 1L, x = smat@x)
tiledb_array_close(arr)

If we look at the simulated matrix, it occupies less than 500 MB:

print(object.size(smat), unit="MB")
## 457.8 Mb

Now we attempt to extract every even row and every third column, via selected_ranges(). (We can't use indices in [ because of #880.)

arr <- tiledb_array(path, query_type = "READ")
desired.rows <- seq(2, nrow(smat), by=2)
desired.columns <- seq(3, ncol(smat), by=3)
selected_ranges(arr) <- list(cbind(desired.rows, desired.rows), cbind(desired.columns, desired.columns))
system.time(arr[])

When the last command runs, I monitor memory usage of the R process via top. On my laptop with 16 GB RAM, usage goes past 80% (i.e., about 13 GB), at which point I terminate the R process otherwise my laptop starts using swap and it'll be frozen for a while. This amount of memory usage seems excessive given that the input data is only 500 MB in size.

Session information
R Under development (unstable) (2026-02-19 r89439)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/trunk/lib/libRblas.so 
LAPACK: /home/luna/Software/R/trunk/lib/libRlapack.so;  LAPACK version 3.12.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Sydney
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RcppSpdlog_0.0.28 tiledb_0.33.0    

loaded via a namespace (and not attached):
 [1] zoo_1.8-15          bit_4.6.0           compiler_4.6.0     
 [4] Matrix_1.7-5        tools_4.6.0         RcppCCTZ_0.2.14    
 [7] spdl_0.0.5          Rcpp_1.1.1          nanoarrow_0.8.0    
[10] bit64_4.6.0-1       nanotime_0.3.13     grid_4.6.0         
[13] data.table_1.18.2.1 lattice_0.22-9     

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions