Indexing failed on a big codebase leaving the cache empty

Hello. I was trying semble v0.4.0 on a big C++ codebase (~135k files, ~2.5GB of source code), but indexing crashed (I think for memory exhaustion) and no index cache was written at all. It looks like cache is not built incrementally, so this makes impossible to use semble on such code base.

Command run to warm up the index:

```
semble search "test" /workspace --top-k 1
```

After 90 minutes it crashed with a simple "Killed" message (searching around it could be caused by an OOM of Python).
During the run, a few times the warning "Recursion depth exceeded in chunk." appeared (but looking at the source code, it seems an handled case).

The critical problem is that no partial index cache was written. No other output about the caching progress was available (so it's impossible to understand how much RAM was used and needed for the task to complete).

Ideally an incremental build of the cache should solve this problem, since it should be able to recover from a crash. But I think that this requires also another improvement to make everything work properly: to avoid loading in memory all the index to process a *search* request (otherwise the OOM error would return).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing failed on a big codebase leaving the cache empty #210

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Indexing failed on a big codebase leaving the cache empty #210

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions