Skip to content

feat: Added multiple tiles support in get_edge_population#126

Open
juanfonsecaLS1 wants to merge 4 commits into
mainfrom
124-bug-add_edge_population-fails-is-there-are-two-ghsl-tiles-overlapping-the-study-area
Open

feat: Added multiple tiles support in get_edge_population#126
juanfonsecaLS1 wants to merge 4 commits into
mainfrom
124-bug-add_edge_population-fails-is-there-are-two-ghsl-tiles-overlapping-the-study-area

Conversation

@juanfonsecaLS1
Copy link
Copy Markdown
Collaborator

@juanfonsecaLS1 juanfonsecaLS1 commented Apr 16, 2026

Description

Adds a function to handle multiple GHSL tile files when the study area spans across more than one tile.

Small change to the code in get_edge_population

Related Issue

Fixes #124

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 📝 Documentation update
  • 🧹 Code refactor (no functional changes)
  • ✅ Test update

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

I used the same minimal example in Rome

import osmnx as ox
from superblockify.population.approximation import add_edge_population
import geopandas as gpd

boundary_lat, boundary_lon = 41.8956456, 12.4655397

graph = ox.graph_from_point(
    (boundary_lat, boundary_lon), 
    dist=500, 
    network_type='drive', 
    simplify=True
)

graph = ox.project_graph(graph)

add_edge_population(graph, overwrite=True)

Screenshots (if applicable)

image

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix/feature works
  • New and existing tests pass locally

Additional Notes

@juanfonsecaLS1 juanfonsecaLS1 requested a review from cbueth April 16, 2026 13:34
@juanfonsecaLS1 juanfonsecaLS1 added the bug Something isn't working label Apr 16, 2026
@juanfonsecaLS1 juanfonsecaLS1 marked this pull request as ready for review April 16, 2026 13:36
@cbueth
Copy link
Copy Markdown
Collaborator

cbueth commented Apr 16, 2026

Thanks for the commits @juanfonsecaLS1 and working on this improvement.
I think I would prefer it to have this list check inside load_ghsl_as_polygons, not adding a separate function. Maybe load_ghsl_as_polygons can call this stitching rasters together inside and then calling itself. Do you get the idea? Is there any reason making it unpractical?
Would you please add a test that tests this literal edge case?
Furthermore, there has been a commit on main that you may want to merge.
Greetings

@juanfonsecaLS1
Copy link
Copy Markdown
Collaborator Author

juanfonsecaLS1 commented Apr 16, 2026

Hi @cbueth, I initially went for the vrt approach as I mentioned before. However, that implied either having an extra dependency in the package or building a XML to create the mosaic.
In the end I went for your initial suggestion, contenating the polygons returned by load_ghsl_as_polygons. I preferred not to change the window argument that load_ghsl_as_polygons.

        with rasopen(ghsl_file) as src:
            load_window = src.window(*bbox_moll)
        
        ghsl_polygons = load_ghsl_as_polygons(ghsl_file, window=load_window)

I did not manage to find a better way round this part of the code. I imagine we can just incorporate the code directly into the get_edge_population if we want to avoid the new function. What do you think?

Sure, I can add a test with the Rome case.

Also, I will rebase this branch too.

Thanks!

chore: add flaky test marker for failing downloads
Copy link
Copy Markdown
Collaborator

@cbueth cbueth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, that is a fine choice I think.

Yes, please add two test cases that fall on two polygons. You can parametrize it with @pytest.mark.parametize().

The black code style is failing, this is some formatting needed to be applied, if possible.

The failing tests in the current CI are due to timeouts I see. I think we need to find an alternative so the tests do not all need to download the tile directly from the server as they are too many resquests, it seems. But this is a separate issue.

Comment thread superblockify/population/approximation.py Outdated
Comment thread superblockify/population/approximation.py
Comment thread superblockify/population/approximation.py Outdated
@juanfonsecaLS1
Copy link
Copy Markdown
Collaborator Author

Just added a parametrised test with a sample of two locations near the edge of a tile. Just checked the results of the Black test and it seems it is an access issue:

remote: Permission to BikeNetKit/superblockify.git denied to github-actions[bot].
fatal: unable to access 'https://github.com/BikeNetKit/superblockify/': The requested URL returned error: 403

@cbueth cbueth changed the title added function to process multiple tiles in get_edge_population feat: added multiple tiles support in get_edge_population May 22, 2026
@cbueth cbueth changed the title feat: added multiple tiles support in get_edge_population feat: Added multiple tiles support in get_edge_population May 22, 2026
Copy link
Copy Markdown
Collaborator

@cbueth cbueth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply.

Thanks for the changes. It is nearly ready to merge. For now we can ignore the black CI step. This I need to fix separately.

I would suggest to add explicit tile verification to test_get_edge_population_multiple_tiles.

The test doesn't verify that Rome and Turbo actually span multiple GHSL tiles. Add this before the add_edge_population call:

from geopandas import GeoDataFrame
from superblockify.population.ghsl import get_ghsl
# ... inside test function after project_graph
boundary_gdf = GeoDataFrame(
    geometry=[graph.graph["boundary"]], crs=graph.graph["boundary_crs"]
).to_crs("World Mollweide")
ghsl_result = get_ghsl(bbox_moll=list(boundary_gdf.total_bounds))
assert isinstance(ghsl_result, list), (
    f"{city_name} should span multiple GHSL tiles, but get_ghsl returned "
    f"{type(ghsl_result).__name__} instead of list"
)
assert len(ghsl_result) >= 2, (
    f"{city_name} should need at least 2 GHSL tiles, but get_ghsl returned "
    f"{len(ghsl_result)} tile(s)"
)

Why: I do believe you that these two examples span multiple tiles, but without this, if the multi-tile handling breaks (e.g., _load_ghsl_multifile is removed or get_ghsl changes to always return a single path), the test would still pass but wouldn't actually test the multi-tile case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: add_edge_population fails is there are two GHSL tiles overlapping the study area

2 participants