@irm-codebase in PR #1:
@sjpfenninger wile digging around in the integrated module workflow, I found a way to avoid having to force users to download the WDPA dataset on their own.
the following curl command should always download the latest version of that dataset
# Full world polygons to GeoJSON
curl -G "https://data-gis.unep-wcmc.org/server/rest/services/ProtectedSites/The_World_Database_of_Protected_Areas/FeatureServer/1/query" \
--data-urlencode "where=1=1" \ # ask for all records
--data-urlencode "outFields=*" \ # ask for all columns
--data-urlencode "outSR=4326" \ # WGS84 (can be another EPSG)
--data-urlencode "f=geojson" \ # GeoJSON, convert to GeoParquet later
-o wdpa_poly_latest.geojson
This results in a ~180 MB download that is completed rather quickly
Another alternative is the ogr2ogr command in GDAL (not tested), which can output GeoParquet directly
# Polygons (layer 1) -> GeoParquet
ogr2ogr -f Parquet wdpa_poly_latest.parquet \
"https://data-gis.unep-wcmc.org/server/rest/services/ProtectedSites/The_World_Database_of_Protected_Areas/FeatureServer/1" \
-t_srs EPSG:4326 \
-lco COMPRESSION=SNAPPY -lco GEOMETRY_ENCODING=WKB
This may warrant further investigation, but it seems like this API is limited to requesting only up to 2000 records. See https://data-gis.unep-wcmc.org/server/rest/services/ProtectedSites/The_World_Database_of_Protected_Areas/FeatureServer which states "MaxRecordCount: 2000". That would explain why the download here is ~10x smaller than the manual download of the full dataset.
@irm-codebase in PR #1:
This may warrant further investigation, but it seems like this API is limited to requesting only up to 2000 records. See https://data-gis.unep-wcmc.org/server/rest/services/ProtectedSites/The_World_Database_of_Protected_Areas/FeatureServer which states "MaxRecordCount: 2000". That would explain why the download here is ~10x smaller than the manual download of the full dataset.