feat: Added multiple tiles support in get_edge_population#126
Conversation
|
Thanks for the commits @juanfonsecaLS1 and working on this improvement. |
|
Hi @cbueth, I initially went for the vrt approach as I mentioned before. However, that implied either having an extra dependency in the package or building a XML to create the mosaic. with rasopen(ghsl_file) as src:
load_window = src.window(*bbox_moll)
ghsl_polygons = load_ghsl_as_polygons(ghsl_file, window=load_window)I did not manage to find a better way round this part of the code. I imagine we can just incorporate the code directly into the Sure, I can add a test with the Rome case. Also, I will rebase this branch too. Thanks! |
chore: add flaky test marker for failing downloads
There was a problem hiding this comment.
Understood, that is a fine choice I think.
Yes, please add two test cases that fall on two polygons. You can parametrize it with @pytest.mark.parametize().
The black code style is failing, this is some formatting needed to be applied, if possible.
The failing tests in the current CI are due to timeouts I see. I think we need to find an alternative so the tests do not all need to download the tile directly from the server as they are too many resquests, it seems. But this is a separate issue.
|
Just added a parametrised test with a sample of two locations near the edge of a tile. Just checked the results of the Black test and it seems it is an access issue: |
get_edge_populationget_edge_population
get_edge_populationget_edge_population
There was a problem hiding this comment.
Sorry for the late reply.
Thanks for the changes. It is nearly ready to merge. For now we can ignore the black CI step. This I need to fix separately.
I would suggest to add explicit tile verification to test_get_edge_population_multiple_tiles.
The test doesn't verify that Rome and Turbo actually span multiple GHSL tiles. Add this before the add_edge_population call:
from geopandas import GeoDataFrame
from superblockify.population.ghsl import get_ghsl
# ... inside test function after project_graph
boundary_gdf = GeoDataFrame(
geometry=[graph.graph["boundary"]], crs=graph.graph["boundary_crs"]
).to_crs("World Mollweide")
ghsl_result = get_ghsl(bbox_moll=list(boundary_gdf.total_bounds))
assert isinstance(ghsl_result, list), (
f"{city_name} should span multiple GHSL tiles, but get_ghsl returned "
f"{type(ghsl_result).__name__} instead of list"
)
assert len(ghsl_result) >= 2, (
f"{city_name} should need at least 2 GHSL tiles, but get_ghsl returned "
f"{len(ghsl_result)} tile(s)"
)Why: I do believe you that these two examples span multiple tiles, but without this, if the multi-tile handling breaks (e.g., _load_ghsl_multifile is removed or get_ghsl changes to always return a single path), the test would still pass but wouldn't actually test the multi-tile case.
Description
Adds a function to handle multiple GHSL tile files when the study area spans across more than one tile.
Small change to the code in
get_edge_populationRelated Issue
Fixes #124
Type of Change
How Has This Been Tested?
I used the same minimal example in Rome
Screenshots (if applicable)
Checklist
Additional Notes