[HRT-72] Fix scraper Overpass OSM — bounding box + Content-Type + User-Agent #9

Open
admin wants to merge 0 commits from feature/HRT-72-fix-overpass-query into master
Owner

Contexte

Fix des 3 bugs identifiés dans leadhunter_scraper.py causant 0 résultat OSM.

Issue Paperclip : HRT-72

Changements

Bug 1 — Requête area → bounding box (lignes 65–74)

Avant : area["name"="Métropole Européenne de Lille"] qui échoue silencieusement.
Après : bounding box directe (50.4,2.8,50.8,3.3) — déterministe et fiable.

Bug 2 — Header Content-Type explicite (ligne 282)

Ajout de Content-Type: application/x-www-form-urlencoded explicite dans requests.post().

Bug 3 (découvert au test) — User-Agent bloqué

overpass-api.de retourne 406 pour User-Agent: python-requests/*.
Fix : User-Agent H3R7Tech-LeadHunter/1.0 (contact@h3r7tech.fr).

Test local

Résultat : 5 leads retournés (Crêperie La Vieille Bourse, Pinocchio, L'Opera Corner, Le Royal, Crêperie Saint-Georges).

Backup

leadhunter_scraper.py.backup_20260427_221429

## Contexte Fix des 3 bugs identifiés dans `leadhunter_scraper.py` causant 0 résultat OSM. Issue Paperclip : HRT-72 ## Changements ### Bug 1 — Requête area → bounding box (lignes 65–74) Avant : `area["name"="Métropole Européenne de Lille"]` qui échoue silencieusement. Après : bounding box directe `(50.4,2.8,50.8,3.3)` — déterministe et fiable. ### Bug 2 — Header Content-Type explicite (ligne 282) Ajout de `Content-Type: application/x-www-form-urlencoded` explicite dans `requests.post()`. ### Bug 3 (découvert au test) — User-Agent bloqué `overpass-api.de` retourne 406 pour `User-Agent: python-requests/*`. Fix : User-Agent `H3R7Tech-LeadHunter/1.0 (contact@h3r7tech.fr)`. ## Test local Résultat : 5 leads retournés (Crêperie La Vieille Bourse, Pinocchio, L'Opera Corner, Le Royal, Crêperie Saint-Georges). ## Backup `leadhunter_scraper.py.backup_20260427_221429`
admin added 1 commit 2026-04-27 22:20:05 +02:00
Bug 1: Replace area["name"="..."] query with direct bounding box (50.4,2.8,50.8,3.3)
  — area resolution fails silently on public Overpass API depending on server version.
  — Direct bbox is deterministic and reliable for MEL coverage.
  — Also simplify website filter to use [!"website"] tag negation syntax.

Bug 2: Add explicit Content-Type: application/x-www-form-urlencoded header
  — Some network configs/proxies strip the implicit header set by requests.post(data={}).
  — Explicit header is best practice per Overpass API docs.

Bug 3 (discovered during test): Add User-Agent header
  — overpass-api.de returns 406 Not Acceptable for User-Agent: python-requests/*.
  — Fix: send H3R7Tech-LeadHunter/1.0 as custom User-Agent.
  — Tested: 5 OSM leads returned from Lille center bounding box.

Backup: leadhunter_scraper.py.backup_20260427_221429

Co-Authored-By: Paperclip <noreply@paperclip.ing>
This branch is already included in the target branch. There is nothing to merge.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feature/HRT-72-fix-overpass-query:feature/HRT-72-fix-overpass-query
git checkout feature/HRT-72-fix-overpass-query
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: admin/turf_saas#9