Compare commits

...

17 Commits

Author SHA1 Message Date
DevOps Engineer
8604dc78b1 feat(HRT-79): alertes Telegram configurables Premium/Pro
- telegram_alerts.py: service envoi alertes via Bot API (send_pre_race_alerts,
  build_race_alert, send_telegram_message) — gestion gracieuse TELEGRAM_BOT_TOKEN absent
- auth_db.py: migrate_telegram_columns() idempotente (ALTER TABLE + try/except OperationalError)
  colonnes: telegram_chat_id, alert_value_bets, alert_top1, alert_quinte_only
- api_v1/routes/user.py: blueprint user_bp GET/POST /api/v1/user/telegram-config
  protégé @jwt_required_middleware + @plan_required('premium','pro')
- api_v1/__init__.py: import + register user_bp
- turf_scheduler.py: run_telegram_alerts() + schedule_dynamic_telegram_alerts()
  planifiées 30min avant course (même pattern que schedule_dynamic_scoring)
  avec try/except Exception + fallback logger

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-29 16:42:15 +02:00
DevOps Engineer
30464fb40c Merge branch 'feature/HRT-84-dashboard-premium-pro' into master
Some checks failed
CD / Deploy → Staging (push) Has been cancelled
CD / Smoke Tests on Staging (push) Has been cancelled
CD / Deploy → Production (push) Has been cancelled
CD / Rollback Production (push) Has been cancelled
[HRT-84] Dashboard SaaS — UI Premium & Pro avec gating plan strict
- Sections Value Bets, Historique, Export CSV raccordées aux vrais endpoints
- Sections Telegram, API Token, Webhook avec mocks (TODO HRT-79, HRT-80)
- Gating plan strict: Free/Premium/Pro non contournable côté client
- Fix: maxDays Pro = 365j (corrige inversion 30j vs 90j)
- Multi-compte Pro: gating UI uniquement (endpoint non défini)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-29 15:49:56 +02:00
DevOps Engineer
31db3a8260 fix(HRT-84): maxDays historique Pro — 365j au lieu de 30j (inversion corrigée)
Pro = 365j (historique le plus long), Premium = 90j, Free = 7j
Corrigé suite au point d'attention CTO dans revue de code.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-29 15:49:25 +02:00
DevOps Engineer
278245cd7c feat(HRT-84): dashboard SaaS — UI Premium & Pro avec gating plan strict
- Ajout sections: Value Bets, Alertes Telegram, API Token, Webhook, Historique, Multi-compte
- Gating plan strict: Free < Premium < Pro (jamais de données réelles derrière plan inférieur)
- Value Bets: raccordé sur endpoint réel /api/v1/valuebets (premium+)
- Historique: raccordé sur endpoint réel /api/v1/history (HRT-81)
- Telegram / API Token / Webhook: mocks structurés avec contrats d'interface
  (TODO: replace mock — HRT-79 pour Telegram, HRT-80 pour API Token/Webhook)
- Multi-compte: gating UI Pro uniquement, endpoint non défini
- Navigation par section avec chargement lazy
- Design cohérent dark theme avec badges, lock icons et CTA upgrade par plan

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-29 15:43:02 +02:00
DevOps Engineer
225295030b fix(HRT-73): refactor api_proxy — COMBINED_ROUTES tuple + align with turf_scraper fix #23
Some checks failed
CD / Deploy → Staging (push) Has been cancelled
CD / Smoke Tests on Staging (push) Has been cancelled
CD / Deploy → Production (push) Has been cancelled
CD / Rollback Production (push) Has been cancelled
- Replace if/elif chain with COMBINED_ROUTES tuple for maintainability
- Add missing routes to combined_api: races, race/, scores, ask, brave-search,
  execute-sql, send-email, report, ideas
- Functionally equivalent to turf_scraper commit 048b969

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 22:38:32 +02:00
DevOps Engineer
86e85aa1c6 fix(HRT-72): fix Overpass OSM scraper — bounding box + Content-Type + User-Agent
Bug 1: Replace area["name"="..."] query with direct bounding box (50.4,2.8,50.8,3.3)
  — area resolution fails silently on public Overpass API depending on server version.
  — Direct bbox is deterministic and reliable for MEL coverage.
  — Also simplify website filter to use [!"website"] tag negation syntax.

Bug 2: Add explicit Content-Type: application/x-www-form-urlencoded header
  — Some network configs/proxies strip the implicit header set by requests.post(data={}).
  — Explicit header is best practice per Overpass API docs.

Bug 3 (discovered during test): Add User-Agent header
  — overpass-api.de returns 406 Not Acceptable for User-Agent: python-requests/*.
  — Fix: send H3R7Tech-LeadHunter/1.0 as custom User-Agent.
  — Tested: 5 OSM leads returned from Lille center bounding box.

Backup: leadhunter_scraper.py.backup_20260427_221429

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 22:19:10 +02:00
5aa6013c52 Merge pull request '[HRT-66] LeadHunter S1 — Core scraping, scoring, CRM SQLite et API Flask' (#8) from feature/HRT-66-leadhunter-core into master
Some checks failed
CD / Deploy → Staging (push) Has been cancelled
CD / Smoke Tests on Staging (push) Has been cancelled
CD / Deploy → Production (push) Has been cancelled
CD / Rollback Production (push) Has been cancelled
2026-04-27 16:55:00 +02:00
DevOps Engineer
4b4323f707 fix(leadhunter): change port 8770→8775 — port 8770 occupé par turf_scraper/crm_api.py
Port audit sur VPS (27/04/2026) :
- 8769 : depenses_trello/app.py (PID 2287989)
- 8770 : turf_scraper/crm_api.py (PID 2287988) ← port précédemment choisi, aussi occupé
- 8775 : libre (vérifié via ss -tlnp | grep 8775 → vide)

Fichiers modifiés :
- leadhunter_api.py : lignes 5, 295, 303 (port 8770→8775)
- infra/turf-saas-leadhunter.service : Description Port 8770→8775

Issue: HRT-66

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 16:48:12 +02:00
DevOps Engineer
356bdf5bec fix(leadhunter): change port 8769→8770 — conflit avec depenses_trello
Port 8769 était occupé par /home/h3r7/depenses_trello/app.py (pid=2287989).
Mise à jour du port dans :
- leadhunter_api.py (docstring, healthcheck, app.run)
- infra/turf-saas-leadhunter.service (description)

Ref: HRT-66

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 16:42:15 +02:00
DevOps Engineer
f9a45e6deb feat(HRT-66): LeadHunter S1 — core scraping, scoring, CRM SQLite et API Flask
- leadhunter_scraper.py : Google Places Nearby Search + Place Details
  avec compteur quota daily_quota.json (limite 900/jour),
  sleep(0.5) entre requêtes, fallback Overpass OSM boundary MEL,
  filtre website absent, déduplcation, rgpd_ok=True

- leadhunter_scorer.py : moteur de scoring 0-8 pts
  critère n°1 = site web absent (+3), avis ≥50 (+2),
  note ≥4.0 (+2), téléphone (+1), note <3.0 (-1)

- leadhunter_crm.py : CRM SQLite schéma validé CTO
  (id, source, name, address, phone, rating, reviews_count,
   website, score, rgpd_ok, scraped_at, status)
  CRUD : insert_lead, get_leads, update_lead_status, get_stats, export_csv

- leadhunter_api.py : Flask service port 8769
  GET /api/leads, POST /api/leads/scrape, GET /api/leads/stats,
  GET /api/leads/export, PATCH /api/leads/<id>/status, GET /health
  assert GOOGLE_PLACES_API_KEY au démarrage
  scraping asynchrone (thread) avec status endpoint

- infra/turf-saas-leadhunter.service : service systemd
  EnvironmentFile=/home/h3r7/.env pour GOOGLE_PLACES_API_KEY

Tests : py_compile OK, scorer testé, CRM SQLite testé

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 16:33:30 +02:00
DevOps Engineer
cfc0f038f9 Merge remote HRT-43 into local master (sync)
Some checks failed
CD / Deploy → Staging (push) Has been cancelled
CD / Smoke Tests on Staging (push) Has been cancelled
CD / Deploy → Production (push) Has been cancelled
CD / Rollback Production (push) Has been cancelled
Merge remote commit 837a084 (HRT-43 ML cache null test) with local
HRT-62, HRT-63, HRT-54 security commits.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 16:16:31 +02:00
DevOps Engineer
c999285895 Merge HRT-63: Blacklist + validation complexite mots de passe
Fix review: abc12345 -> abc1234 dans test_security.py (TestWeakPasswordRejection)
Valide CTO — coherence blacklist/test confirmee.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 16:14:17 +02:00
837a0845ec Merge pull request 'HRT-43 — Test intégration ml_predictions_cache : zéro NULL hippodrome' (#5) from feature/HRT-43-ml-cache-null-test into master
Some checks failed
CD / Deploy → Staging (push) Has been cancelled
CD / Smoke Tests on Staging (push) Has been cancelled
CD / Deploy → Production (push) Has been cancelled
CD / Rollback Production (push) Has been cancelled
2026-04-27 15:36:48 +02:00
CTO H3R7Tech
4bf458f1b8 Merge HRT-62: IP-based rate limiting on /auth/login — validated CTO
- In-memory IP rate limiter: 5 attempts / 5min window
- 15 min block on exceed, HTTP 429 + Retry-After header
- Applied rate_limit_middleware on portal_server.py
- Tests: TestLoginRateLimit added (conflict resolved: keep both test classes)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 15:24:07 +02:00
CTO H3R7Tech
099286b078 Merge HRT-63 + HRT-54: password blacklist/complexity + billing JWT fix — validated CTO
- HRT-63: WEAK_PASSWORDS blacklist (50+ entries) + validate_password_strength()
- HRT-54: billing JWT token fix, table name corrections

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 15:22:03 +02:00
DevOps Engineer
7f5573f076 feat(security): add IP-based rate limiting on /api/v1/auth/login — fix brute force HRT-62
- saas_auth.py: in-memory sliding-window rate limiter (5 attempts/5min, 15min block)
  using collections.defaultdict + threading.Lock, stdlib only, no new deps
- portal_server.py: register rate_limit_middleware + access_log_middleware
  (was missing, leaving global 100req/min limit unApplied on portal routes)
- tests/security/test_security.py: add TestLoginRateLimit class with
  test_login_brute_force_blocked_after_5_attempts and test_login_429_has_retry_after_header

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 14:50:08 +02:00
DevOps Engineer
82d6bdafba HRT-43 — Test intégration ml_predictions_cache : zéro NULL hippodrome
- Ajout tests/test_ml_cache_integrity.py : 7 tests integration vérifiant
  que hippodrome, race_label et heure ne sont pas NULL pour la date courante
- Ajout marqueur 'integration' dans pytest.ini
- Connexion DB en lecture seule (mode=ro) pour protection prod
- Support variable d'env TEST_DATE et TURF_DB_PATH
- Tests skippés proprement si job 19h30 n'a pas encore tourné
- Validé sur les données 2026-04-26 : 7/7 PASSED (1005 lignes, 0 NULL)

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-27 14:26:46 +02:00
16 changed files with 3477 additions and 97 deletions

View File

@@ -3,6 +3,7 @@
API v1 Blueprint package — Turf SaaS
Sprint 3-4: HRT-29 — Refacto API /v1/
Sprint 5-6: HRT-31 — Billing Stripe
HRT-79: Alertes Telegram configurables (user blueprint)
Registers sub-blueprints:
/api/v1/health — public health-check
@@ -13,6 +14,7 @@ Registers sub-blueprints:
/api/v1/export/ — export CSV (pro)
/api/v1/metrics — métriques perf ML (premium+)
/api/v1/billing/ — Stripe checkout, portal, webhook, status
/api/v1/user/ — config utilisateur, alertes Telegram (premium+)
/api/v1/docs — Swagger UI (via flasgger, registered on app)
"""
@@ -26,6 +28,7 @@ from .routes.backtest import backtest_bp
from .routes.export import export_bp
from .routes.metrics import metrics_bp
from .routes.billing import billing_bp
from .routes.user import user_bp
# Master blueprint that aggregates all sub-routes under /api/v1
api_v1_bp = Blueprint("api_v1", __name__, url_prefix="/api/v1")
@@ -41,3 +44,4 @@ def register_api_v1(app):
app.register_blueprint(export_bp)
app.register_blueprint(metrics_bp)
app.register_blueprint(billing_bp)
app.register_blueprint(user_bp)

216
api_v1/routes/user.py Normal file
View File

@@ -0,0 +1,216 @@
#!/usr/bin/env python3
"""
User route for API v1 — Telegram alert configuration
HRT-79: Alertes Telegram configurables (Premium)
GET /api/v1/user/telegram-config — Lire la config Telegram de l'utilisateur connecté
POST /api/v1/user/telegram-config — Mettre à jour la config Telegram
Accès : Premium / Pro uniquement (@jwt_required_middleware + @plan_required)
"""
import sqlite3
from flask import Blueprint, jsonify, request
from api_v1.utils import internal_error, bad_request
from auth import jwt_required_middleware, plan_required
user_bp = Blueprint("v1_user", __name__, url_prefix="/api/v1/user")
# DB_PATH est résolu via la même variable d'env que auth_db.py
import os
_DB_PATH = os.environ.get("TURF_SAAS_DB", "/home/h3r7/turf_saas/turf_saas.db")
def _get_db():
conn = sqlite3.connect(_DB_PATH)
conn.row_factory = sqlite3.Row
return conn
# ── GET /api/v1/user/telegram-config ──────────────────────────────────────────
@user_bp.route("/telegram-config", methods=["GET"])
@jwt_required_middleware
@plan_required("premium", "pro")
def get_telegram_config():
"""
Retourne la configuration Telegram de l'utilisateur connecté.
---
tags:
- Utilisateur
summary: Lire la config alertes Telegram (premium+)
security:
- Bearer: []
responses:
200:
description: Configuration Telegram courante
schema:
properties:
telegram_chat_id:
type: string
nullable: true
alert_value_bets:
type: boolean
alert_top1:
type: boolean
alert_quinte_only:
type: boolean
401:
description: Token invalide
403:
description: Plan insuffisant
"""
user_id = request.user_id # injecté par jwt_required_middleware
conn = _get_db()
try:
row = conn.execute(
"""
SELECT telegram_chat_id, alert_value_bets, alert_top1, alert_quinte_only
FROM users
WHERE id = ?
""",
(user_id,),
).fetchone()
if not row:
return jsonify({"error": "Utilisateur introuvable"}), 404
return jsonify(
{
"telegram_chat_id": row["telegram_chat_id"],
"alert_value_bets": bool(row["alert_value_bets"]),
"alert_top1": bool(row["alert_top1"]),
"alert_quinte_only": bool(row["alert_quinte_only"]),
}
), 200
except sqlite3.OperationalError as exc:
# Colonnes absentes : migration non appliquée
return jsonify(
{
"telegram_chat_id": None,
"alert_value_bets": True,
"alert_top1": True,
"alert_quinte_only": False,
"_warning": "Migration Telegram non appliquée",
}
), 200
except Exception as exc:
return internal_error(str(exc))
finally:
conn.close()
# ── POST /api/v1/user/telegram-config ─────────────────────────────────────────
@user_bp.route("/telegram-config", methods=["POST"])
@jwt_required_middleware
@plan_required("premium", "pro")
def update_telegram_config():
"""
Met à jour la configuration Telegram de l'utilisateur connecté.
---
tags:
- Utilisateur
summary: Configurer les alertes Telegram (premium+)
security:
- Bearer: []
parameters:
- in: body
name: body
required: true
schema:
properties:
telegram_chat_id:
type: string
description: Chat ID Telegram (ou null pour désactiver)
alert_value_bets:
type: boolean
default: true
alert_top1:
type: boolean
default: true
alert_quinte_only:
type: boolean
default: false
responses:
200:
description: Configuration mise à jour
400:
description: Paramètres invalides
401:
description: Token invalide
403:
description: Plan insuffisant
"""
user_id = request.user_id # injecté par jwt_required_middleware
data = request.get_json(silent=True)
if not data:
return bad_request("Corps JSON requis")
# Validation et extraction des champs
telegram_chat_id = data.get("telegram_chat_id")
if telegram_chat_id is not None and not isinstance(telegram_chat_id, str):
return bad_request("telegram_chat_id doit être une chaîne ou null")
if isinstance(telegram_chat_id, str):
telegram_chat_id = telegram_chat_id.strip() or None
alert_value_bets = data.get("alert_value_bets", True)
alert_top1 = data.get("alert_top1", True)
alert_quinte_only = data.get("alert_quinte_only", False)
if not isinstance(alert_value_bets, bool):
return bad_request("alert_value_bets doit être un booléen")
if not isinstance(alert_top1, bool):
return bad_request("alert_top1 doit être un booléen")
if not isinstance(alert_quinte_only, bool):
return bad_request("alert_quinte_only doit être un booléen")
conn = _get_db()
try:
conn.execute(
"""
UPDATE users
SET telegram_chat_id = ?,
alert_value_bets = ?,
alert_top1 = ?,
alert_quinte_only = ?
WHERE id = ?
""",
(
telegram_chat_id,
int(alert_value_bets),
int(alert_top1),
int(alert_quinte_only),
user_id,
),
)
conn.commit()
return jsonify(
{
"status": "ok",
"telegram_chat_id": telegram_chat_id,
"alert_value_bets": alert_value_bets,
"alert_top1": alert_top1,
"alert_quinte_only": alert_quinte_only,
}
), 200
except sqlite3.OperationalError as exc:
return jsonify(
{
"error": "Migration Telegram non appliquée — contacter le support",
"detail": str(exc),
}
), 500
except Exception as exc:
return internal_error(str(exc))
finally:
conn.close()

View File

@@ -2,6 +2,7 @@
"""
Auth DB — users and subscriptions schema for turf_saas.db
Sprint 2-3: Auth JWT + Multi-tenant (HRT-28)
HRT-79: migration Telegram columns
"""
import sqlite3
@@ -63,6 +64,36 @@ def init_auth_tables():
conn.close()
print("[auth_db] Tables users, subscriptions, refresh_tokens created/verified.")
# Apply Telegram columns migration (idempotent)
migrate_telegram_columns()
def migrate_telegram_columns():
"""
Migration idempotente : ajoute les colonnes Telegram à la table users.
Utilise ALTER TABLE ... ADD COLUMN avec try/except OperationalError
pour être safe si les colonnes existent déjà (SQLite ne supporte pas IF NOT EXISTS).
HRT-79
"""
conn = get_db()
c = conn.cursor()
columns = [
("telegram_chat_id", "TEXT DEFAULT NULL"),
("alert_value_bets", "INTEGER DEFAULT 1"),
("alert_top1", "INTEGER DEFAULT 1"),
("alert_quinte_only", "INTEGER DEFAULT 0"),
]
for col, definition in columns:
try:
c.execute(f"ALTER TABLE users ADD COLUMN {col} {definition}")
print(f"[auth_db] Colonne '{col}' ajoutée.")
except sqlite3.OperationalError:
# Column already exists — safe to ignore
pass
conn.commit()
conn.close()
print("[auth_db] Migration Telegram columns OK.")
if __name__ == "__main__":
init_auth_tables()

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,21 @@
[Unit]
Description=H3R7Tech LeadHunter API (Port 8775)
Documentation=https://portal-kolifee.duckdns.org
After=network.target
[Service]
Type=simple
User=h3r7
WorkingDirectory=/home/h3r7/turf_saas
# Charger les variables d'environnement depuis /home/h3r7/.env
# (notamment GOOGLE_PLACES_API_KEY)
EnvironmentFile=/home/h3r7/.env
ExecStart=/home/h3r7/turf_saas/venv/bin/python3 /home/h3r7/turf_saas/leadhunter_api.py
Restart=always
RestartSec=10
Environment=PYTHONPATH=/home/h3r7/turf_saas
[Install]
WantedBy=multi-user.target

303
leadhunter_api.py Normal file
View File

@@ -0,0 +1,303 @@
#!/usr/bin/env python3
"""
H3R7Tech — LeadHunter API
===========================
Service Flask sur port 8775 exposant les endpoints LeadHunter.
Endpoints :
GET /api/leads — Liste les leads (filtres: status, limit, offset)
POST /api/leads/scrape — Lance un job de scraping asynchrone
GET /api/leads/stats — Statistiques globales du CRM
GET /api/leads/export — Export CSV des leads
PATCH /api/leads/<id>/status — Met à jour le statut d'un lead
Port : 8775 (8769 occupé par depenses_trello/app.py, 8770 occupé par turf_scraper/crm_api.py — corrigé HRT-66)
Auteur: H3R7Tech Backend Engineer
Issue: HRT-66
"""
import os
import threading
import logging
from logging.handlers import RotatingFileHandler
from flask import Flask, jsonify, request, Response
from flask_cors import CORS
# Import des modules LeadHunter
from leadhunter_crm import (
init_db,
insert_leads,
get_leads,
get_lead_by_id,
update_lead_status,
get_stats,
export_csv,
VALID_STATUSES,
DB_PATH,
)
from leadhunter_scraper import run_scraping, GOOGLE_PLACES_API_KEY
from leadhunter_scorer import LeadScorer
# ─── Assertions au démarrage ─────────────────────────────────────────────────
# Vérification obligatoire : la clé API doit être présente au démarrage
assert os.environ.get("GOOGLE_PLACES_API_KEY"), (
"GOOGLE_PLACES_API_KEY manquante. "
"Ajouter dans /home/h3r7/.env : export GOOGLE_PLACES_API_KEY=xxx"
)
# ─── Logging ────────────────────────────────────────────────────────────────
logger = logging.getLogger("leadhunter.api")
_handler = RotatingFileHandler(
"/home/h3r7/leadhunter.log",
maxBytes=5 * 1024 * 1024,
backupCount=3,
)
_handler.setFormatter(
logging.Formatter("%(asctime)s %(levelname)-8s %(name)s%(message)s")
)
logger.setLevel(logging.INFO)
if not logger.handlers:
logger.addHandler(_handler)
logger.addHandler(logging.StreamHandler())
# ─── App Flask ───────────────────────────────────────────────────────────────
app = Flask(__name__)
CORS(app)
# Scorer singleton
scorer = LeadScorer()
# État global du job de scraping (simple flag — pas de celery nécessaire pour le POC)
_scrape_job = {
"running": False,
"last_run": None,
"last_count": 0,
"last_error": None,
}
_scrape_lock = threading.Lock()
# ─── Init DB ─────────────────────────────────────────────────────────────────
init_db(DB_PATH)
logger.info("LeadHunter API démarrée — DB initialisée.")
# ─── Helpers ─────────────────────────────────────────────────────────────────
def _run_scrape_job(max_leads: int, use_google: bool, use_osm: bool) -> None:
"""Job de scraping exécuté dans un thread séparé."""
with _scrape_lock:
_scrape_job["running"] = True
_scrape_job["last_error"] = None
try:
leads_raw = run_scraping(
max_leads=max_leads,
use_google=use_google,
use_osm=use_osm,
)
leads_scored = scorer.score_leads(leads_raw)
inserted_ids = insert_leads(leads_scored)
with _scrape_lock:
_scrape_job["last_count"] = len(inserted_ids)
from datetime import datetime
_scrape_job["last_run"] = datetime.utcnow().isoformat() + "Z"
logger.info(f"Scrape job terminé : {len(inserted_ids)} leads insérés.")
except Exception as e:
logger.warning(f"Scrape job erreur : {e}")
with _scrape_lock:
_scrape_job["last_error"] = str(e)
finally:
with _scrape_lock:
_scrape_job["running"] = False
# ─── Routes ──────────────────────────────────────────────────────────────────
@app.route("/api/leads", methods=["GET"])
def api_get_leads():
"""
Liste les leads du CRM.
Query params :
- status (str, optional) : filtre sur new/contacted/closed/rejected
- limit (int, default=50) : pagination
- offset (int, default=0) : pagination
"""
status = request.args.get("status")
try:
limit = int(request.args.get("limit", 50))
offset = int(request.args.get("offset", 0))
except ValueError:
return jsonify({"error": "limit et offset doivent être des entiers"}), 400
if status and status not in VALID_STATUSES:
return jsonify(
{"error": f"status invalide. Valeurs acceptées : {VALID_STATUSES}"}
), 400
leads = get_leads(status=status, limit=limit, offset=offset)
return jsonify(
{
"leads": leads,
"count": len(leads),
"limit": limit,
"offset": offset,
"status_filter": status,
}
)
@app.route("/api/leads/scrape", methods=["POST"])
def api_scrape():
"""
Lance un job de scraping asynchrone.
Body JSON (optionnel) :
- max_leads (int, default=100)
- use_google (bool, default=true)
- use_osm (bool, default=true)
Retourne immédiatement avec le statut du job.
"""
with _scrape_lock:
if _scrape_job["running"]:
return jsonify(
{
"status": "already_running",
"message": "Un job de scraping est déjà en cours.",
}
), 409
body = request.get_json(silent=True) or {}
max_leads = int(body.get("max_leads", 100))
use_google = bool(body.get("use_google", True))
use_osm = bool(body.get("use_osm", True))
thread = threading.Thread(
target=_run_scrape_job,
args=(max_leads, use_google, use_osm),
daemon=True,
)
thread.start()
logger.info(
f"Job de scraping lancé (max_leads={max_leads}, "
f"use_google={use_google}, use_osm={use_osm})"
)
return jsonify(
{
"status": "started",
"message": "Job de scraping démarré en arrière-plan.",
"params": {
"max_leads": max_leads,
"use_google": use_google,
"use_osm": use_osm,
},
}
), 202
@app.route("/api/leads/scrape/status", methods=["GET"])
def api_scrape_status():
"""Retourne l'état courant du job de scraping."""
with _scrape_lock:
return jsonify(dict(_scrape_job))
@app.route("/api/leads/stats", methods=["GET"])
def api_stats():
"""
Statistiques globales du CRM LeadHunter.
Retourne : total, by_status, by_source, avg_score, top_leads_count
"""
stats = get_stats()
if not stats:
return jsonify({"error": "Impossible de calculer les statistiques"}), 500
return jsonify(stats)
@app.route("/api/leads/export", methods=["GET"])
def api_export():
"""
Export CSV de tous les leads (ou filtrés par status).
Query params :
- status (str, optional)
"""
status = request.args.get("status")
if status and status not in VALID_STATUSES:
return jsonify({"error": f"status invalide : {VALID_STATUSES}"}), 400
csv_content = export_csv(status=status)
filename = f"leadhunter_leads{'_' + status if status else ''}.csv"
return Response(
csv_content,
mimetype="text/csv",
headers={
"Content-Disposition": f"attachment; filename={filename}",
"Content-Type": "text/csv; charset=utf-8",
},
)
@app.route("/api/leads/<int:lead_id>/status", methods=["PATCH"])
def api_update_status(lead_id: int):
"""
Met à jour le statut d'un lead.
Body JSON :
- status (str) : new | contacted | closed | rejected
"""
body = request.get_json(silent=True)
if not body or "status" not in body:
return jsonify({"error": "Body JSON requis avec le champ 'status'"}), 400
new_status = body["status"]
if new_status not in VALID_STATUSES:
return jsonify({"error": f"status invalide. Valeurs : {VALID_STATUSES}"}), 400
lead = get_lead_by_id(lead_id)
if not lead:
return jsonify({"error": f"Lead id={lead_id} introuvable"}), 404
success = update_lead_status(lead_id, new_status)
if not success:
return jsonify({"error": "Mise à jour échouée"}), 500
return jsonify(
{
"success": True,
"lead_id": lead_id,
"new_status": new_status,
}
)
@app.route("/health", methods=["GET"])
def health():
"""Healthcheck pour systemd / monitoring."""
return jsonify(
{
"status": "ok",
"service": "leadhunter-api",
"port": 8775,
}
)
# ─── Entrypoint ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8775, debug=False)

349
leadhunter_crm.py Normal file
View File

@@ -0,0 +1,349 @@
#!/usr/bin/env python3
"""
H3R7Tech — LeadHunter CRM (SQLite)
=====================================
Couche de persistance SQLite pour les leads LeadHunter.
Schéma validé CTO (HRT-66) :
CREATE TABLE leads (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL, -- 'google_places' ou 'osm'
name TEXT NOT NULL,
address TEXT,
phone TEXT,
rating REAL,
reviews_count INTEGER,
website TEXT,
score INTEGER,
rgpd_ok BOOLEAN DEFAULT 1,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status TEXT DEFAULT 'new' -- new, contacted, closed, rejected
);
Auteur: H3R7Tech Backend Engineer
Issue: HRT-66
"""
import sqlite3
import logging
import csv
import io
from contextlib import contextmanager
from datetime import datetime
from logging.handlers import RotatingFileHandler
from typing import Optional
# ─── Logging ────────────────────────────────────────────────────────────────
logger = logging.getLogger("leadhunter.crm")
_handler = RotatingFileHandler(
"/home/h3r7/leadhunter.log",
maxBytes=5 * 1024 * 1024,
backupCount=3,
)
_handler.setFormatter(
logging.Formatter("%(asctime)s %(levelname)-8s %(name)s%(message)s")
)
logger.setLevel(logging.INFO)
if not logger.handlers:
logger.addHandler(_handler)
logger.addHandler(logging.StreamHandler())
# ─── Chemin DB ───────────────────────────────────────────────────────────────
DB_PATH = "/home/h3r7/leadhunter.db"
# Statuts valides pour un lead
VALID_STATUSES = {"new", "contacted", "closed", "rejected"}
# ─── Initialisation ──────────────────────────────────────────────────────────
def init_db(db_path: str = DB_PATH) -> None:
"""
Crée la base SQLite et la table leads si elle n'existe pas.
Idempotent — peut être appelé au démarrage de l'API.
"""
with sqlite3.connect(db_path) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS leads (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL,
name TEXT NOT NULL,
address TEXT,
phone TEXT,
rating REAL,
reviews_count INTEGER,
website TEXT,
score INTEGER,
rgpd_ok BOOLEAN DEFAULT 1,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status TEXT DEFAULT 'new'
)
""")
conn.commit()
logger.info(f"DB initialisée : {db_path}")
# ─── Context manager ─────────────────────────────────────────────────────────
@contextmanager
def _get_conn(db_path: str = DB_PATH):
"""Fournit une connexion SQLite avec row_factory."""
conn = sqlite3.connect(db_path)
conn.row_factory = sqlite3.Row
try:
yield conn
conn.commit()
except Exception as e:
conn.rollback()
logger.warning(f"DB transaction rollback : {e}")
raise
finally:
conn.close()
# ─── CRUD ────────────────────────────────────────────────────────────────────
def insert_lead(lead: dict, db_path: str = DB_PATH) -> Optional[int]:
"""
Insère un lead normalisé dans la DB.
Args:
lead: dict avec les champs normalisés (source, name, address, ...)
db_path: chemin vers la DB SQLite.
Returns:
L'id SQLite du lead inséré, ou None en cas d'erreur.
"""
try:
with _get_conn(db_path) as conn:
cursor = conn.execute(
"""
INSERT INTO leads
(source, name, address, phone, rating, reviews_count,
website, score, rgpd_ok, status)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
lead.get("source", "unknown"),
lead.get("name", ""),
lead.get("address", ""),
lead.get("phone", ""),
lead.get("rating"),
lead.get("reviews_count"),
lead.get("website", ""),
lead.get("score"),
1 if lead.get("rgpd_ok", True) else 0,
lead.get("status", "new"),
),
)
lead_id = cursor.lastrowid
logger.info(f"Lead inséré id={lead_id} : {lead.get('name')}")
return lead_id
except Exception as e:
logger.warning(f"insert_lead error : {e}")
return None
def insert_leads(leads: list[dict], db_path: str = DB_PATH) -> list[int]:
"""
Insère une liste de leads en batch.
Returns:
Liste des ids insérés.
"""
ids = []
for lead in leads:
lead_id = insert_lead(lead, db_path)
if lead_id is not None:
ids.append(lead_id)
logger.info(f"insert_leads : {len(ids)}/{len(leads)} insérés.")
return ids
def get_leads(
status: Optional[str] = None,
limit: int = 100,
offset: int = 0,
db_path: str = DB_PATH,
) -> list[dict]:
"""
Récupère les leads avec filtre optionnel sur le statut.
Args:
status: filtre sur le champ 'status' (new, contacted, closed, rejected).
limit: pagination — nombre de résultats max.
offset: pagination — décalage.
Returns:
Liste de dicts (tous les champs de la table leads).
"""
try:
with _get_conn(db_path) as conn:
if status:
rows = conn.execute(
"SELECT * FROM leads WHERE status = ? ORDER BY score DESC, scraped_at DESC LIMIT ? OFFSET ?",
(status, limit, offset),
).fetchall()
else:
rows = conn.execute(
"SELECT * FROM leads ORDER BY score DESC, scraped_at DESC LIMIT ? OFFSET ?",
(limit, offset),
).fetchall()
return [dict(r) for r in rows]
except Exception as e:
logger.warning(f"get_leads error : {e}")
return []
def get_lead_by_id(lead_id: int, db_path: str = DB_PATH) -> Optional[dict]:
"""Récupère un lead par son id."""
try:
with _get_conn(db_path) as conn:
row = conn.execute(
"SELECT * FROM leads WHERE id = ?", (lead_id,)
).fetchone()
return dict(row) if row else None
except Exception as e:
logger.warning(f"get_lead_by_id error : {e}")
return None
def update_lead_status(lead_id: int, status: str, db_path: str = DB_PATH) -> bool:
"""
Met à jour le statut d'un lead.
Args:
lead_id: id du lead.
status: nouveau statut ('new', 'contacted', 'closed', 'rejected').
Returns:
True si mise à jour réussie, False sinon.
"""
if status not in VALID_STATUSES:
logger.warning(f"update_lead_status : statut invalide '{status}'")
return False
try:
with _get_conn(db_path) as conn:
conn.execute(
"UPDATE leads SET status = ? WHERE id = ?",
(status, lead_id),
)
logger.info(f"Lead id={lead_id} statut → {status}")
return True
except Exception as e:
logger.warning(f"update_lead_status error : {e}")
return False
def get_stats(db_path: str = DB_PATH) -> dict:
"""
Retourne les statistiques globales du CRM.
Returns:
Dict avec total, by_status, by_source, avg_score, top_leads_count
"""
try:
with _get_conn(db_path) as conn:
total = conn.execute("SELECT COUNT(*) FROM leads").fetchone()[0]
by_status_rows = conn.execute(
"SELECT status, COUNT(*) as cnt FROM leads GROUP BY status"
).fetchall()
by_status = {r["status"]: r["cnt"] for r in by_status_rows}
by_source_rows = conn.execute(
"SELECT source, COUNT(*) as cnt FROM leads GROUP BY source"
).fetchall()
by_source = {r["source"]: r["cnt"] for r in by_source_rows}
avg_score_row = conn.execute(
"SELECT AVG(score) FROM leads WHERE score IS NOT NULL"
).fetchone()
avg_score = round(avg_score_row[0] or 0, 2)
# Leads "chauds" = score ≥ 5
top_count = conn.execute(
"SELECT COUNT(*) FROM leads WHERE score >= 5"
).fetchone()[0]
return {
"total": total,
"by_status": by_status,
"by_source": by_source,
"avg_score": avg_score,
"top_leads_count": top_count,
"generated_at": datetime.utcnow().isoformat() + "Z",
}
except Exception as e:
logger.warning(f"get_stats error : {e}")
return {}
def export_csv(
status: Optional[str] = None,
db_path: str = DB_PATH,
) -> str:
"""
Exporte les leads en CSV (string).
Args:
status: filtre optionnel sur le statut.
Returns:
Contenu CSV en string UTF-8.
"""
leads = get_leads(status=status, limit=10000, db_path=db_path)
output = io.StringIO()
fieldnames = [
"id",
"source",
"name",
"address",
"phone",
"rating",
"reviews_count",
"website",
"score",
"rgpd_ok",
"scraped_at",
"status",
]
writer = csv.DictWriter(output, fieldnames=fieldnames, extrasaction="ignore")
writer.writeheader()
writer.writerows(leads)
logger.info(f"export_csv : {len(leads)} leads exportés.")
return output.getvalue()
# ─── CLI (debug) ─────────────────────────────────────────────────────────────
if __name__ == "__main__":
init_db()
# Test insertion
test_lead = {
"source": "google_places",
"name": "Restaurant Test",
"address": "10 rue de la Paix, 59000 Lille",
"phone": "+33 3 20 00 00 01",
"rating": 4.5,
"reviews_count": 120,
"website": "",
"score": 8,
"rgpd_ok": True,
"status": "new",
}
lead_id = insert_lead(test_lead)
print(f"Lead inséré : id={lead_id}")
leads = get_leads()
print(f"Leads en DB : {len(leads)}")
stats = get_stats()
print(f"Stats : {stats}")

193
leadhunter_scorer.py Normal file
View File

@@ -0,0 +1,193 @@
#!/usr/bin/env python3
"""
H3R7Tech — LeadHunter Scorer
================================
Moteur de scoring des leads restaurants MEL.
Critères (ordre de priorité métier) :
1. [+3] Site web absent ← CRITIQUE : raison d'être du produit
2. [+2] Nombre d'avis élevé (≥ 50) : forte activité = bon prospect de vente
3. [+2] Note Google élevée (≥ 4.0) : établissement sérieux
4. [+1] Téléphone présent : facilite la prise de contact
5. [-1] Note faible (< 3.0) : risque reputationnel pour la prestation web
Score maximum théorique : 8
Score minimum : 0 (leads avec site web ne doivent pas passer ici)
Auteur: H3R7Tech Backend Engineer
Issue: HRT-66
"""
import logging
from logging.handlers import RotatingFileHandler
# ─── Logging ────────────────────────────────────────────────────────────────
logger = logging.getLogger("leadhunter.scorer")
_handler = RotatingFileHandler(
"/home/h3r7/leadhunter.log",
maxBytes=5 * 1024 * 1024,
backupCount=3,
)
_handler.setFormatter(
logging.Formatter("%(asctime)s %(levelname)-8s %(name)s%(message)s")
)
logger.setLevel(logging.INFO)
if not logger.handlers:
logger.addHandler(_handler)
logger.addHandler(logging.StreamHandler())
# ─── Scorer ──────────────────────────────────────────────────────────────────
class LeadScorer:
"""
Calcule le score de priorité d'un lead.
Le score sert à trier les leads dans le CRM :
- Score élevé = prospect chaud (sans site + actif + bien noté)
- Score faible = prospect froid (peut être ignoré ou traité en dernier)
"""
def _calculate_score(self, lead: dict) -> int:
"""
Calcule le score d'un lead.
Args:
lead: dict avec les champs normalisés du scraper
(name, website, rating, reviews_count, phone, ...)
Returns:
Score entier (08)
"""
score = 0
# ── Critère 1 : site web absent [CRITIQUE — logique métier centrale] ──
# C'est le critère n°1 : on cherche des restaurants SANS site web
# pour leur proposer une création de site à 8001500€.
website = lead.get("website", "")
if not website or not website.strip():
score += 3
logger.debug(f"{lead.get('name')}: +3 (site web absent)")
else:
# Si le lead a un site web, score = 0 immédiatement.
# Ce cas ne devrait pas se produire (filtre scraper),
# mais on reste défensif.
logger.warning(
f"{lead.get('name')}: site web présent ({website}), "
"lead ignoré pour scoring."
)
return 0
# ── Critère 2 : nombre d'avis élevé (≥ 50) ──────────────────────────
reviews = lead.get("reviews_count")
if reviews is not None:
try:
reviews = int(reviews)
if reviews >= 50:
score += 2
logger.debug(f"{lead.get('name')}: +2 (avis ≥ 50 : {reviews})")
except (TypeError, ValueError) as e:
logger.warning(f"reviews_count invalide pour {lead.get('name')}: {e}")
# ── Critère 3 : bonne note Google (≥ 4.0) ───────────────────────────
rating = lead.get("rating")
if rating is not None:
try:
rating = float(rating)
if rating >= 4.0:
score += 2
logger.debug(f"{lead.get('name')}: +2 (note ≥ 4.0 : {rating})")
elif rating < 3.0:
score -= 1
logger.debug(f"{lead.get('name')}: -1 (note < 3.0 : {rating})")
except (TypeError, ValueError) as e:
logger.warning(f"rating invalide pour {lead.get('name')}: {e}")
# ── Critère 4 : téléphone présent ────────────────────────────────────
phone = lead.get("phone", "")
if phone and phone.strip():
score += 1
logger.debug(f"{lead.get('name')}: +1 (téléphone présent)")
# Plancher à 0
score = max(0, score)
logger.info(f"Score calculé pour '{lead.get('name')}' : {score}/8")
return score
def score_lead(self, lead: dict) -> dict:
"""
Enrichit un lead avec son score.
Args:
lead: dict normalisé du scraper.
Returns:
Même dict avec le champ 'score' ajouté/mis à jour.
"""
lead = dict(lead) # copie défensive
lead["score"] = self._calculate_score(lead)
return lead
def score_leads(self, leads: list[dict]) -> list[dict]:
"""
Score et trie une liste de leads (score décroissant).
Args:
leads: liste de dicts normalisés.
Returns:
Liste triée par score décroissant.
"""
scored = [self.score_lead(lead) for lead in leads]
scored.sort(key=lambda l: l.get("score", 0), reverse=True)
logger.info(
f"score_leads terminé : {len(scored)} leads scorés. "
f"Score max = {scored[0]['score'] if scored else 0}, "
f"Score min = {scored[-1]['score'] if scored else 0}"
)
return scored
# ─── CLI (debug) ─────────────────────────────────────────────────────────────
if __name__ == "__main__":
# Exemple de test rapide sans appel API
test_leads = [
{
"name": "Restaurant A",
"website": "",
"rating": 4.5,
"reviews_count": 120,
"phone": "+33 3 20 00 00 01",
},
{
"name": "Restaurant B",
"website": "",
"rating": 3.8,
"reviews_count": 30,
"phone": "",
},
{
"name": "Café C",
"website": "",
"rating": 2.5,
"reviews_count": 5,
"phone": "+33 3 20 00 00 03",
},
{
"name": "Bar D avec site",
"website": "https://bar-d.fr",
"rating": 4.2,
"reviews_count": 80,
"phone": "+33 3 20 00 00 04",
},
]
scorer = LeadScorer()
results = scorer.score_leads(test_leads)
print("\n=== Résultats scoring ===")
for r in results:
print(f" [{r['score']:2d}/8] {r['name']}")

397
leadhunter_scraper.py Normal file
View File

@@ -0,0 +1,397 @@
#!/usr/bin/env python3
"""
H3R7Tech — LeadHunter Scraper
================================
Agent de scraping pour la détection de restaurants sans site web
dans la MEL (Métropole Européenne de Lille).
Sources :
- Google Places API (primary)
- OpenStreetMap / Overpass API (fallback)
Quota Google Places Free Tier :
- 28 500 requêtes/mois ≈ 950/jour
- Compteur persistent dans /home/h3r7/leadhunter_quota.json
Auteur: H3R7Tech Backend Engineer
Issue: HRT-66
"""
import os
import json
import time
import logging
import requests
from datetime import date, datetime
from logging.handlers import RotatingFileHandler
# ─── Logging ────────────────────────────────────────────────────────────────
logger = logging.getLogger("leadhunter.scraper")
_handler = RotatingFileHandler(
"/home/h3r7/leadhunter.log",
maxBytes=5 * 1024 * 1024, # 5 MB
backupCount=3,
)
_handler.setFormatter(
logging.Formatter("%(asctime)s %(levelname)-8s %(name)s%(message)s")
)
logger.setLevel(logging.INFO)
if not logger.handlers:
logger.addHandler(_handler)
logger.addHandler(logging.StreamHandler())
# ─── Configuration ───────────────────────────────────────────────────────────
GOOGLE_PLACES_API_KEY = os.environ.get("GOOGLE_PLACES_API_KEY")
# Quota journalier Google Places Free Tier
DAILY_QUOTA_FILE = "/home/h3r7/leadhunter_quota.json"
DAILY_QUOTA_LIMIT = 900 # marge de sécurité vs les 950 théoriques
# Délai entre requêtes Places pour éviter rate-limiting
PLACES_SLEEP_S = 0.5
# Bounding box MEL (Métropole Européenne de Lille)
MEL_CENTER_LAT = 50.6292
MEL_CENTER_LNG = 3.0573
MEL_RADIUS_M = 20000 # 20 km autour de Lille
# Types de lieux ciblés
TARGET_TYPES = ["restaurant", "cafe", "bar", "bakery", "food"]
# Overpass API endpoint
OVERPASS_URL = "https://overpass-api.de/api/interpreter"
# Requête Overpass MEL — bounding box directe (50.4,2.8,50.8,3.3) couvrant la MEL
# Fix HRT-72 : la résolution area["name"=...] échoue silencieusement sur l'API Overpass publique
OVERPASS_MEL_QUERY = """
[out:json][timeout:60];
(
node["amenity"~"^(restaurant|cafe|bar|fast_food|bakery)$"][!"website"](50.4,2.8,50.8,3.3);
way["amenity"~"^(restaurant|cafe|bar|fast_food|bakery)$"][!"website"](50.4,2.8,50.8,3.3);
);
out center 200;
"""
# ─── Quota Manager ───────────────────────────────────────────────────────────
def _load_quota() -> dict:
"""Charge le compteur quotidien depuis le fichier JSON."""
today = str(date.today())
if os.path.exists(DAILY_QUOTA_FILE):
try:
with open(DAILY_QUOTA_FILE, "r") as f:
data = json.load(f)
if data.get("date") == today:
return data
except Exception as e:
logger.warning(f"Impossible de lire le fichier quota : {e}")
return {"date": today, "count": 0}
def _save_quota(data: dict) -> None:
"""Persiste le compteur quotidien."""
try:
with open(DAILY_QUOTA_FILE, "w") as f:
json.dump(data, f)
except Exception as e:
logger.warning(f"Impossible d'écrire le fichier quota : {e}")
def _increment_quota(n: int = 1) -> int:
"""Incrémente le compteur et retourne le total du jour."""
quota = _load_quota()
quota["count"] += n
_save_quota(quota)
return quota["count"]
def _quota_remaining() -> int:
"""Retourne le nombre de requêtes restantes pour aujourd'hui."""
quota = _load_quota()
return max(0, DAILY_QUOTA_LIMIT - quota["count"])
# ─── Google Places Scraper ────────────────────────────────────────────────────
class GooglePlacesScraper:
"""
Scraping via Google Places API (Nearby Search + Place Details).
Filtre les lieux sans site web côté API.
"""
BASE_URL = "https://maps.googleapis.com/maps/api/place"
def __init__(self):
if not GOOGLE_PLACES_API_KEY:
raise EnvironmentError(
"GOOGLE_PLACES_API_KEY non définie. "
"Ajouter dans /home/h3r7/.env et relancer."
)
self.api_key = GOOGLE_PLACES_API_KEY
def _nearby_search(self, place_type: str, page_token: str = None) -> dict:
"""Appel Nearby Search — 1 requête comptabilisée."""
params = {
"key": self.api_key,
"location": f"{MEL_CENTER_LAT},{MEL_CENTER_LNG}",
"radius": MEL_RADIUS_M,
"type": place_type,
}
if page_token:
params["pagetoken"] = page_token
_increment_quota()
time.sleep(PLACES_SLEEP_S)
try:
resp = requests.get(
f"{self.BASE_URL}/nearbysearch/json",
params=params,
timeout=10,
)
resp.raise_for_status()
return resp.json()
except Exception as e:
logger.warning(f"NearbySearch error (type={place_type}): {e}")
return {}
def _place_details(self, place_id: str) -> dict:
"""Place Details pour récupérer website, phone, rating, etc. — 1 requête."""
params = {
"key": self.api_key,
"place_id": place_id,
"fields": "name,formatted_address,formatted_phone_number,website,rating,user_ratings_total",
}
_increment_quota()
time.sleep(PLACES_SLEEP_S)
try:
resp = requests.get(
f"{self.BASE_URL}/details/json",
params=params,
timeout=10,
)
resp.raise_for_status()
return resp.json().get("result", {})
except Exception as e:
logger.warning(f"PlaceDetails error (place_id={place_id}): {e}")
return {}
def scrape(self, max_leads: int = 50) -> list[dict]:
"""
Scrape les restaurants/cafés/bars MEL sans site web.
Retourne une liste de dicts normalisés compatibles LeadHunter CRM :
source, name, address, phone, rating, reviews_count, website, rgpd_ok
"""
leads = []
seen_ids = set()
for place_type in TARGET_TYPES:
if _quota_remaining() < 10:
logger.warning(
"Quota journalier presque épuisé — arrêt scraping Google Places."
)
break
logger.info(f"Scraping Google Places — type={place_type}")
page_token = None
while True:
if _quota_remaining() < 5:
logger.warning("Quota insuffisant pour continuer la pagination.")
break
data = self._nearby_search(place_type, page_token)
results = data.get("results", [])
for place in results:
if len(leads) >= max_leads:
break
place_id = place.get("place_id", "")
if not place_id or place_id in seen_ids:
continue
seen_ids.add(place_id)
if _quota_remaining() < 2:
logger.warning("Quota épuisé avant details.")
break
details = self._place_details(place_id)
# Filtre : on ne garde que les lieux SANS site web
if details.get("website"):
continue
lead = {
"source": "google_places",
"name": details.get("name") or place.get("name", ""),
"address": details.get("formatted_address")
or place.get("vicinity", ""),
"phone": details.get("formatted_phone_number", ""),
"rating": details.get("rating") or place.get("rating"),
"reviews_count": details.get("user_ratings_total")
or place.get("user_ratings_total"),
"website": "",
"rgpd_ok": True, # Données publiques Google Places uniquement
}
leads.append(lead)
logger.info(f"Lead trouvé (Google Places) : {lead['name']}")
if len(leads) >= max_leads:
break
page_token = data.get("next_page_token")
if not page_token:
break
# L'API Google Places nécessite un délai avant d'utiliser next_page_token
time.sleep(2)
logger.info(f"Google Places : {len(leads)} leads collectés.")
return leads
# ─── Overpass / OSM Fallback ──────────────────────────────────────────────────
class OverpassScraper:
"""
Fallback OSM via Overpass API.
Cible les nœuds/ways dans la boundary MEL sans attribut 'website'.
Données publiques ODbL — RGPD OK.
"""
def scrape(self, max_leads: int = 100) -> list[dict]:
"""
Scrape via Overpass API — retourne des leads normalisés.
"""
logger.info("Scraping Overpass OSM — boundary MEL")
leads = []
try:
resp = requests.post(
OVERPASS_URL,
data={"data": OVERPASS_MEL_QUERY},
headers={
"Content-Type": "application/x-www-form-urlencoded", # Fix HRT-72 Bug2
"User-Agent": "H3R7Tech-LeadHunter/1.0 (contact@h3r7tech.fr)", # Fix HRT-72 Bug3: overpass-api.de blocks python-requests UA
},
timeout=90,
)
resp.raise_for_status()
data = resp.json()
except Exception as e:
logger.warning(f"Overpass API error : {e}")
return []
elements = data.get("elements", [])
logger.info(f"Overpass : {len(elements)} éléments bruts reçus.")
for el in elements[:max_leads]:
tags = el.get("tags", {})
# Coordonnées (pour les ways, Overpass retourne 'center')
lat = el.get("lat") or (el.get("center") or {}).get("lat")
lon = el.get("lon") or (el.get("center") or {}).get("lon")
name = tags.get("name", "")
if not name:
continue # Ignorer les lieux sans nom
addr_parts = [
tags.get("addr:housenumber", ""),
tags.get("addr:street", ""),
tags.get("addr:city", ""),
tags.get("addr:postcode", ""),
]
address = " ".join(p for p in addr_parts if p).strip()
if not address and lat and lon:
address = f"{lat:.4f},{lon:.4f}"
lead = {
"source": "osm",
"name": name,
"address": address,
"phone": tags.get("phone", tags.get("contact:phone", "")),
"rating": None,
"reviews_count": None,
"website": "",
"rgpd_ok": True, # Données publiques ODbL
}
leads.append(lead)
logger.info(f"Lead trouvé (OSM) : {lead['name']}")
logger.info(f"Overpass : {len(leads)} leads collectés.")
return leads
# ─── Orchestrateur ────────────────────────────────────────────────────────────
def run_scraping(
max_leads: int = 100, use_google: bool = True, use_osm: bool = True
) -> list[dict]:
"""
Lance le scraping Google Places + fallback OSM.
Args:
max_leads: nombre maximum de leads à collecter au total.
use_google: activer Google Places (nécessite GOOGLE_PLACES_API_KEY).
use_osm: activer le fallback Overpass OSM.
Returns:
Liste de leads normalisés (dédupliqués par nom + adresse).
"""
all_leads = []
seen_keys = set()
def _dedup_key(lead: dict) -> str:
return f"{lead['name'].lower().strip()}|{lead['address'].lower().strip()[:40]}"
if use_google:
try:
scraper = GooglePlacesScraper()
google_leads = scraper.scrape(max_leads=max_leads)
for lead in google_leads:
k = _dedup_key(lead)
if k not in seen_keys:
seen_keys.add(k)
all_leads.append(lead)
except EnvironmentError as e:
logger.warning(f"Google Places désactivé : {e}")
use_google = False
remaining = max_leads - len(all_leads)
if use_osm and remaining > 0:
osm_leads = OverpassScraper().scrape(max_leads=remaining)
for lead in osm_leads:
k = _dedup_key(lead)
if k not in seen_keys:
seen_keys.add(k)
all_leads.append(lead)
logger.info(
f"run_scraping terminé — {len(all_leads)} leads uniques "
f"(Google={use_google}, OSM={use_osm}). "
f"Quota restant aujourd'hui : {_quota_remaining()}"
)
return all_leads
# ─── CLI (debug) ─────────────────────────────────────────────────────────────
if __name__ == "__main__":
assert GOOGLE_PLACES_API_KEY, (
"GOOGLE_PLACES_API_KEY manquante — "
"ajouter 'export GOOGLE_PLACES_API_KEY=xxx' dans /home/h3r7/.env"
)
leads = run_scraping(max_leads=10)
for i, l in enumerate(leads, 1):
print(f"{i:02d}. [{l['source']}] {l['name']}{l['address']}")

View File

@@ -5,8 +5,11 @@ import json
import requests
import subprocess
import db
from middleware import rate_limit_middleware, access_log_middleware
app = Flask(__name__)
rate_limit_middleware(app)
access_log_middleware(app)
DASHBOARD_API_URL = "http://localhost:8791"
COMBINED_API_URL = "http://localhost:8790"
@@ -740,19 +743,29 @@ def pod_static(filename=""):
@app.route("/turf/api/")
@app.route("/turf/api/<path:api_path>")
def api_proxy(api_path=""):
if api_path.startswith("vitesse"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("n8n-proxy"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("backtest"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("stats"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("predictions_analysis"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("parisroi"):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("paris"):
# Routes servies par combined_api.py (port 8790) :
# backtest, stats, paris, parisroi, races, scores, report, ask, brave-search,
# execute-sql, send-email, vitesse, n8n-proxy, predictions_analysis, ideas
# Fix HRT-73 : alignement complet avec turf_scraper fix #23
COMBINED_ROUTES = (
"backtest",
"stats",
"parisroi",
"paris",
"predictions_analysis",
"vitesse",
"n8n-proxy",
"races",
"race/",
"scores",
"ask",
"brave-search",
"execute-sql",
"send-email",
"report",
"ideas",
)
if any(api_path.startswith(r) for r in COMBINED_ROUTES):
url = f"{COMBINED_API_URL}/turf/api/{api_path}"
elif api_path.startswith("scoring"):
url = f"{DASHBOARD_API_URL}/turf/api/{api_path}"
@@ -767,11 +780,17 @@ def api_proxy(api_path=""):
if fwd_method in ("POST", "PUT", "PATCH")
else None
)
# Forwarder Authorization header (combined_api.py exige Basic h3r7:h3r7 pour parisroi/paris)
fwd_headers = {"Content-Type": "application/json"}
if request.headers.get("Authorization"):
fwd_headers["Authorization"] = request.headers.get("Authorization")
incoming_auth = request.headers.get("Authorization")
if incoming_auth:
fwd_headers["Authorization"] = incoming_auth
resp = requests.request(
method=fwd_method, url=url, json=fwd_json, timeout=30, headers=fwd_headers
method=fwd_method,
url=url,
json=fwd_json,
timeout=30,
headers=fwd_headers,
)
return resp.content, resp.status_code, {"Content-Type": "application/json"}
except Exception as e:

View File

@@ -10,3 +10,4 @@ markers =
load: Tests de charge Locust
security: Tests de sécurité
smoke: Tests rapides de smoke
integration: Tests d'intégration DB et pipeline ML

View File

@@ -14,6 +14,18 @@ import time
import json
from functools import wraps
from datetime import datetime
from collections import defaultdict
from threading import Lock
# ─── Rate limiting login ───────────────────────────────────────────────────────
_login_attempts: dict = defaultdict(
lambda: {"count": 0, "window_start": 0.0, "blocked_until": 0.0}
)
_login_lock = Lock()
LOGIN_RATE_MAX = 5 # max tentatives par fenêtre
LOGIN_RATE_WINDOW = 300 # 5 minutes (en secondes)
LOGIN_BLOCK_DURATION = 900 # 15 min de blocage après dépassement
# ─── Blacklist mots de passe faibles ─────────────────────────────────────────
# HRT-63 — Validation mots de passe faibles
@@ -300,6 +312,37 @@ def login():
if not email or not password:
return jsonify({"error": "Email et mot de passe requis."}), 400
# ── Rate limit par IP ────────────────────────────────────────
ip = request.remote_addr or "unknown"
now = time.time()
with _login_lock:
bucket = _login_attempts[ip]
# Lever le blocage si la durée est écoulée
if now >= bucket["blocked_until"]:
if now - bucket["window_start"] >= LOGIN_RATE_WINDOW:
bucket["count"] = 0
bucket["window_start"] = now
bucket["count"] += 1
count = bucket["count"]
if count > LOGIN_RATE_MAX:
bucket["blocked_until"] = now + LOGIN_BLOCK_DURATION
retry_after = LOGIN_BLOCK_DURATION
blocked = True
else:
retry_after = int(LOGIN_RATE_WINDOW - (now - bucket["window_start"]))
blocked = False
else:
blocked = True
retry_after = int(bucket["blocked_until"] - now)
if blocked:
resp = jsonify({"error": "Trop de tentatives. Réessayez plus tard."})
resp.status_code = 429
resp.headers["Retry-After"] = str(retry_after)
return resp
# ─────────────────────────────────────────────────────────────
pw_hash = hash_password(password)
conn = get_db()
user = conn.execute(

284
telegram_alerts.py Normal file
View File

@@ -0,0 +1,284 @@
#!/usr/bin/env python3
"""
Telegram Alerts — Service d'alertes pré-course pour les utilisateurs Premium/Pro
HRT-79: Alertes Telegram configurables (Premium)
Fonctionnement :
- 30 minutes avant chaque course détectée, envoie un message Telegram
aux utilisateurs Premium/Pro ayant configuré leur chat_id.
- Les préférences individuelles (value_bets, top1, quinte_only) sont respectées.
- Requiert la variable d'environnement TELEGRAM_BOT_TOKEN.
"""
import os
import logging
import sqlite3
from datetime import datetime
from typing import Optional
import requests
logger = logging.getLogger(__name__)
DB_PATH = os.environ.get("TURF_SAAS_DB", "/home/h3r7/turf_saas/turf_saas.db")
BOT_TOKEN = os.environ.get("TELEGRAM_BOT_TOKEN", "")
TELEGRAM_API_BASE = "https://api.telegram.org/bot{token}/sendMessage"
# ── Helpers ───────────────────────────────────────────────────────────────────
def _get_db():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
return conn
def send_telegram_message(chat_id: str, text: str) -> bool:
"""
Envoie un message Telegram à un chat_id donné.
Returns True si succès, False sinon.
Ne lève pas d'exception pour ne pas crasher le scheduler.
"""
if not BOT_TOKEN:
logger.warning("[TELEGRAM] TELEGRAM_BOT_TOKEN non configuré — envoi ignoré")
return False
url = TELEGRAM_API_BASE.format(token=BOT_TOKEN)
payload = {
"chat_id": chat_id,
"text": text,
"parse_mode": "Markdown",
"disable_web_page_preview": True,
}
try:
resp = requests.post(url, json=payload, timeout=10)
if resp.status_code == 200:
return True
logger.warning(
"[TELEGRAM] Echec envoi chat_id=%s status=%d body=%s",
chat_id,
resp.status_code,
resp.text[:200],
)
return False
except requests.RequestException as exc:
logger.error("[TELEGRAM] Exception HTTP chat_id=%s: %s", chat_id, exc)
return False
# ── Alert builder ─────────────────────────────────────────────────────────────
def build_race_alert(race_data: dict, predictions: list) -> str:
"""
Construit le message Markdown de l'alerte pré-course.
Args:
race_data: dict avec les clés 'hippo', 'num_course', 'heure', 'type_course'
predictions: liste de dicts {'num_cheval', 'nom_cheval', 'prob_top3', 'is_value_bet', 'ml_score'}
Returns: texte Markdown formaté
"""
hippo = race_data.get("hippo", "?")
num_course = race_data.get("num_course", "?")
heure = race_data.get("heure", "?")
type_course = race_data.get("type_course", "")
lines = [
f"🏇 *Alerte course — {hippo} R{num_course}*",
f"⏰ Départ prévu : *{heure}*",
]
if type_course:
lines.append(f"📋 Type : {type_course}")
lines.append("")
top3 = [p for p in predictions if p.get("prob_top3", 0) > 0][:3]
value_bets = [p for p in predictions if p.get("is_value_bet")]
if top3:
lines.append("📊 *Top-3 ML :*")
for i, p in enumerate(top3, 1):
nom = p.get("nom_cheval", f"#{p.get('num_cheval', '?')}")
prob = p.get("prob_top3", 0)
lines.append(f" {i}. {nom}{prob:.0%} prob top-3")
lines.append("")
if value_bets:
lines.append("💡 *Value bets :*")
for p in value_bets[:3]:
nom = p.get("nom_cheval", f"#{p.get('num_cheval', '?')}")
score = p.get("ml_score", 0)
lines.append(f"{nom} (score {score:.2f})")
lines.append("")
lines.append("_Alerte automatique Turf SaaS — 30min avant départ_")
return "\n".join(lines)
# ── Main send function ────────────────────────────────────────────────────────
def send_pre_race_alerts(minutes_before: int = 30) -> dict:
"""
Interroge la DB pour récupérer les courses du jour, puis envoie
des alertes Telegram aux utilisateurs Premium/Pro éligibles.
Args:
minutes_before: non utilisé directement (la planification est gérée
par le scheduler), présent pour documentation.
Returns: dict {'sent': int, 'skipped': int, 'errors': int}
"""
if not BOT_TOKEN:
logger.warning(
"[TELEGRAM] TELEGRAM_BOT_TOKEN absent — send_pre_race_alerts ignoré"
)
return {"sent": 0, "skipped": 0, "errors": 0}
stats = {"sent": 0, "skipped": 0, "errors": 0}
try:
conn = _get_db()
today = datetime.now().strftime("%Y-%m-%d")
# Récupère les courses du jour
try:
courses_rows = conn.execute(
"""
SELECT DISTINCT
hippo, num_course, heure_depart, type_course
FROM pmu_courses
WHERE date_programme = ?
AND heure_depart IS NOT NULL
ORDER BY heure_depart ASC
LIMIT 20
""",
(today,),
).fetchall()
except sqlite3.OperationalError as exc:
logger.warning("[TELEGRAM] Table pmu_courses introuvable: %s", exc)
conn.close()
return stats
if not courses_rows:
logger.info("[TELEGRAM] Aucune course aujourd'hui — pas d'alerte")
conn.close()
return stats
# Récupère les utilisateurs Premium/Pro avec chat_id configuré
try:
users = conn.execute(
"""
SELECT id, telegram_chat_id,
alert_value_bets, alert_top1, alert_quinte_only
FROM users
WHERE plan IN ('premium', 'pro')
AND is_active = 1
AND telegram_chat_id IS NOT NULL
AND telegram_chat_id != ''
""",
).fetchall()
except sqlite3.OperationalError as exc:
logger.warning(
"[TELEGRAM] Colonnes Telegram absentes (migration non appliquée?): %s",
exc,
)
conn.close()
return stats
if not users:
logger.info("[TELEGRAM] Aucun utilisateur avec chat_id configuré")
conn.close()
return stats
for course_row in courses_rows:
hippo = course_row["hippo"] or "?"
num_course = course_row["num_course"] or "?"
heure_ts = course_row["heure_depart"]
type_course = course_row["type_course"] or ""
try:
dt = datetime.fromtimestamp(heure_ts / 1000)
heure_str = dt.strftime("%H:%M")
except Exception:
heure_str = str(heure_ts)
race_data = {
"hippo": hippo,
"num_course": num_course,
"heure": heure_str,
"type_course": type_course,
}
# Récupère les prédictions ML pour cette course
predictions = []
try:
pred_rows = conn.execute(
"""
SELECT num_cheval, nom_cheval, prob_top3, is_value_bet, ml_score
FROM ml_predictions_cache
WHERE date = ?
AND hippo = ?
AND num_course = ?
ORDER BY prob_top3 DESC
LIMIT 10
""",
(today, hippo, num_course),
).fetchall()
predictions = [dict(r) for r in pred_rows]
except sqlite3.OperationalError:
pass # table absente, on envoie quand même avec données minimales
is_quinte = (
"quinté" in type_course.lower() or "quinte" in type_course.lower()
)
for user in users:
chat_id = user["telegram_chat_id"]
alert_quinte_only = bool(user["alert_quinte_only"])
alert_top1 = bool(user["alert_top1"])
alert_value_bets = bool(user["alert_value_bets"])
# Filtre quinte_only
if alert_quinte_only and not is_quinte:
stats["skipped"] += 1
continue
# Construit le message selon préférences
filtered_preds = []
if predictions:
for p in predictions:
include = False
if alert_top1 and p.get("prob_top3", 0) > 0:
include = True
if alert_value_bets and p.get("is_value_bet"):
include = True
if include:
filtered_preds.append(p)
text = build_race_alert(race_data, filtered_preds)
ok = send_telegram_message(chat_id, text)
if ok:
stats["sent"] += 1
else:
stats["errors"] += 1
conn.close()
except Exception as exc:
logger.error("[TELEGRAM] Erreur inattendue dans send_pre_race_alerts: %s", exc)
import traceback
traceback.print_exc()
stats["errors"] += 1
logger.info(
"[TELEGRAM] Alertes pré-course: %d envoyées, %d ignorées, %d erreurs",
stats["sent"],
stats["skipped"],
stats["errors"],
)
return stats

View File

@@ -141,7 +141,7 @@ class TestJWTAuthentication:
"invalid_signature_here"
)
resp = requests.get(
f"{BASE_URL}/api/races",
f"{BASE_URL}/api/v1/predictions/today",
headers={"Authorization": f"Bearer {expired_token}"},
timeout=5,
)
@@ -153,7 +153,7 @@ class TestJWTAuthentication:
"""Un token JWT malformé doit être rejeté."""
for bad_token in ["not.a.jwt", "Bearer", "null", "undefined", ""]:
resp = requests.get(
f"{BASE_URL}/api/races",
f"{BASE_URL}/api/v1/predictions/today",
headers={"Authorization": f"Bearer {bad_token}"},
timeout=5,
)
@@ -163,7 +163,7 @@ class TestJWTAuthentication:
def test_jwt_sans_token(self):
"""Sans token, les routes protégées doivent retourner 401."""
resp = requests.get(f"{BASE_URL}/api/export/csv", timeout=5)
resp = requests.get(f"{BASE_URL}/api/v1/export/csv", timeout=5)
assert resp.status_code in (401, 403), (
f"Route protégée accessible sans token: status={resp.status_code}"
)
@@ -386,6 +386,53 @@ class TestWeakPasswordRejection:
assert resp.status_code == 400, (
f"Mot de passe sans lettre accepté: status={resp.status_code}"
)
# === Tests rate limiting login ===
class TestLoginRateLimit:
"""Tests rate limiting sur /api/v1/auth/login."""
TARGET_URL = (
os.environ.get("APP_URL", "http://localhost:8792") + "/api/v1/auth/login"
)
def test_login_brute_force_blocked_after_5_attempts(self):
"""Après 5 tentatives, le 6ème appel doit retourner 429."""
# Utiliser un email unique pour isoler le test
email = f"ratelimit_test_{int(time.time())}@h3r7.tech"
for i in range(5):
resp = requests.post(
self.TARGET_URL,
json={"email": email, "password": "wrong_password"},
timeout=5,
)
assert resp.status_code in (400, 401), (
f"Tentative {i + 1}: status inattendu {resp.status_code}"
)
# La 6ème tentative doit être bloquée
resp = requests.post(
self.TARGET_URL,
json={"email": email, "password": "wrong_password"},
timeout=5,
)
assert resp.status_code == 429, (
f"Rate limit non appliqué après 5 tentatives: got {resp.status_code}"
)
assert "Retry-After" in resp.headers, "Header Retry-After manquant sur 429"
def test_login_429_has_retry_after_header(self):
"""La réponse 429 doit inclure Retry-After."""
email = f"ratelimit_test2_{int(time.time())}@h3r7.tech"
for _ in range(6):
requests.post(
self.TARGET_URL, json={"email": email, "password": "x"}, timeout=5
)
resp = requests.post(
self.TARGET_URL, json={"email": email, "password": "x"}, timeout=5
)
if resp.status_code == 429:
assert "Retry-After" in resp.headers
assert int(resp.headers["Retry-After"]) > 0
if __name__ == "__main__":

View File

@@ -0,0 +1,300 @@
"""
test_ml_cache_integrity.py — Test d'intégration : zéro NULL dans ml_predictions_cache
SaaS Turf Prédictions IA
Ticket: HRT-43 (suite au fix HRT-41 — métadonnées manquantes dans le cache ML)
Ces tests vérifient que la table ml_predictions_cache ne contient aucune ligne
avec des métadonnées NULL (hippodrome, race_label, heure) pour la date courante,
après le job ML de 19h30.
Usage:
pytest tests/test_ml_cache_integrity.py -v -m integration
pytest tests/test_ml_cache_integrity.py -v -m integration --date 2026-04-26
Variables d'environnement:
TURF_DB_PATH : chemin vers turf.db (défaut: /home/h3r7/turf_scraper/turf.db)
TEST_DATE : date cible au format YYYY-MM-DD (défaut: date du jour)
"""
import sqlite3
import os
import pytest
from datetime import date, datetime
from pathlib import Path
# ============================================================
# Configuration
# ============================================================
DEFAULT_DB_PATH = "/home/h3r7/turf_scraper/turf.db"
DB_PATH = os.environ.get("TURF_DB_PATH", DEFAULT_DB_PATH)
def _get_test_date() -> str:
"""Retourne la date cible pour les tests (env TEST_DATE ou date du jour)."""
env_date = os.environ.get("TEST_DATE", "")
if env_date:
try:
datetime.strptime(env_date, "%Y-%m-%d")
return env_date
except ValueError:
raise ValueError(
f"TEST_DATE invalide : '{env_date}'. Format attendu : YYYY-MM-DD"
)
return date.today().isoformat()
# ============================================================
# Fixture : connexion DB en lecture seule
# ============================================================
@pytest.fixture(scope="module")
def db_connection():
"""
Connexion SQLite en mode lecture seule (uri=True + ?mode=ro).
Garantit qu'aucune modification accidentelle de la DB de prod n'est possible.
"""
db_path = Path(DB_PATH)
if not db_path.exists():
pytest.skip(
f"Base de données introuvable : {DB_PATH}. "
"Définir TURF_DB_PATH ou vérifier le chemin."
)
uri = f"file:{db_path.as_posix()}?mode=ro"
conn = sqlite3.connect(uri, uri=True)
conn.row_factory = sqlite3.Row
yield conn
conn.close()
@pytest.fixture(scope="module")
def target_date():
"""Date cible pour les tests (date du jour ou TEST_DATE)."""
return _get_test_date()
# ============================================================
# Tests d'intégration
# ============================================================
@pytest.mark.integration
class TestMlCacheNullIntegrity:
"""
Vérifie qu'après le job ML de 19h30, la table ml_predictions_cache
ne contient aucune métadonnée NULL pour la date courante.
Régression testée : HRT-41 (Fix #17 — métadonnées manquantes dans le cache ML)
"""
def test_table_exists(self, db_connection):
"""Vérifie que la table ml_predictions_cache existe dans la DB."""
cursor = db_connection.execute(
"SELECT name FROM sqlite_master "
"WHERE type='table' AND name='ml_predictions_cache'"
)
row = cursor.fetchone()
assert row is not None, (
"La table ml_predictions_cache est introuvable dans la base de données. "
"Vérifier que le job ML a bien créé la table."
)
def test_rows_exist_for_today(self, db_connection, target_date):
"""
Vérifie que des prédictions existent pour la date cible.
Ce test passe en skip si aucune ligne n'existe (ex: avant le job 19h30).
Il échoue uniquement si le job a manifestement tourné mais a laissé 0 lignes.
"""
cursor = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache WHERE date = ?",
(target_date,),
)
count = cursor.fetchone()["cnt"]
if count == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Ce test doit être exécuté après le job ML de 19h30."
)
def test_zero_null_hippodrome_today(self, db_connection, target_date):
"""
CRITÈRE D'ACCEPTATION PRINCIPAL (HRT-43) :
Vérifie que COUNT(*) WHERE date = today AND hippodrome IS NULL = 0.
Régression directe du bug HRT-41 : le champ hippodrome était NULL
pour toutes les prédictions du cache ML.
"""
# Vérifier si des données existent avant de tester les NULLs
cursor_total = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache WHERE date = ?",
(target_date,),
)
total = cursor_total.fetchone()["cnt"]
if total == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Lancer ce test après le job ML de 19h30."
)
cursor = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache "
"WHERE date = ? AND hippodrome IS NULL",
(target_date,),
)
null_count = cursor.fetchone()["cnt"]
assert null_count == 0, (
f"RÉGRESSION HRT-41 DÉTECTÉE : {null_count} ligne(s) avec hippodrome IS NULL "
f"dans ml_predictions_cache pour le {target_date}. "
"Le patch de métadonnées n'a pas été appliqué correctement."
)
def test_zero_null_race_label_today(self, db_connection, target_date):
"""
Vérifie que COUNT(*) WHERE date = today AND race_label IS NULL = 0.
Complément du test hippodrome : vérifie que le libellé de course
est bien renseigné pour toutes les prédictions.
"""
cursor_total = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache WHERE date = ?",
(target_date,),
)
total = cursor_total.fetchone()["cnt"]
if total == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Lancer ce test après le job ML de 19h30."
)
cursor = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache "
"WHERE date = ? AND race_label IS NULL",
(target_date,),
)
null_count = cursor.fetchone()["cnt"]
assert null_count == 0, (
f"ANOMALIE : {null_count} ligne(s) avec race_label IS NULL "
f"dans ml_predictions_cache pour le {target_date}. "
"Vérifier le pipeline de patch de métadonnées."
)
def test_zero_null_heure_today(self, db_connection, target_date):
"""
Vérifie que COUNT(*) WHERE date = today AND heure IS NULL = 0.
Vérifie que l'heure de course est bien renseignée pour toutes les prédictions.
"""
cursor_total = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache WHERE date = ?",
(target_date,),
)
total = cursor_total.fetchone()["cnt"]
if total == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Lancer ce test après le job ML de 19h30."
)
cursor = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache "
"WHERE date = ? AND heure IS NULL",
(target_date,),
)
null_count = cursor.fetchone()["cnt"]
assert null_count == 0, (
f"ANOMALIE : {null_count} ligne(s) avec heure IS NULL "
f"dans ml_predictions_cache pour le {target_date}. "
"Vérifier le pipeline de patch de métadonnées."
)
def test_full_metadata_coverage_today(self, db_connection, target_date):
"""
Test de couverture globale : aucune des trois colonnes critiques
(hippodrome, race_label, heure) n'est NULL pour une même ligne.
Retourne les 5 premières lignes problématiques pour faciliter le débogage.
"""
cursor_total = db_connection.execute(
"SELECT COUNT(*) as cnt FROM ml_predictions_cache WHERE date = ?",
(target_date,),
)
total = cursor_total.fetchone()["cnt"]
if total == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Lancer ce test après le job ML de 19h30."
)
cursor = db_connection.execute(
"SELECT id, num_reunion, num_course, horse_name, hippodrome, race_label, heure "
"FROM ml_predictions_cache "
"WHERE date = ? "
"AND (hippodrome IS NULL OR race_label IS NULL OR heure IS NULL) "
"LIMIT 5",
(target_date,),
)
bad_rows = cursor.fetchall()
assert len(bad_rows) == 0, (
f"ANOMALIE : {len(bad_rows)} ligne(s) avec au moins une métadonnée NULL "
f"(hippodrome, race_label ou heure) pour le {target_date}.\n"
"Exemples de lignes affectées :\n"
+ "\n".join(
f" - id={r['id']} R{r['num_reunion']}C{r['num_course']} "
f"{r['horse_name']} | hippodrome={r['hippodrome']!r} "
f"race_label={r['race_label']!r} heure={r['heure']!r}"
for r in bad_rows
)
)
def test_metadata_completeness_summary(self, db_connection, target_date):
"""
Résumé diagnostique : affiche les statistiques de complétude des métadonnées
pour la date cible. Toujours en mode informatif (pas de assertion stricte),
utile pour le monitoring et les logs CI.
"""
cursor = db_connection.execute(
"""
SELECT
COUNT(*) as total,
SUM(CASE WHEN hippodrome IS NULL THEN 1 ELSE 0 END) as null_hippodrome,
SUM(CASE WHEN race_label IS NULL THEN 1 ELSE 0 END) as null_race_label,
SUM(CASE WHEN heure IS NULL THEN 1 ELSE 0 END) as null_heure,
COUNT(DISTINCT hippodrome) as distinct_hippodromes,
COUNT(DISTINCT race_label) as distinct_race_labels
FROM ml_predictions_cache
WHERE date = ?
""",
(target_date,),
)
row = cursor.fetchone()
total = row["total"]
if total == 0:
pytest.skip(
f"Aucune prédiction en cache pour le {target_date}. "
"Lancer ce test après le job ML de 19h30."
)
# Afficher les statistiques (visibles avec pytest -v -s)
print(f"\n=== Statistiques ml_predictions_cache pour le {target_date} ===")
print(f" Total lignes : {total}")
print(f" NULL hippodrome : {row['null_hippodrome']}")
print(f" NULL race_label : {row['null_race_label']}")
print(f" NULL heure : {row['null_heure']}")
print(f" Hippodromes distincts: {row['distinct_hippodromes']}")
print(f" Race labels distincts: {row['distinct_race_labels']}")
# L'assertion ici reste stricte pour hippodrome (bug HRT-41 critique)
assert row["null_hippodrome"] == 0, (
f"RÉGRESSION HRT-41 : {row['null_hippodrome']}/{total} lignes "
f"avec hippodrome IS NULL pour le {target_date}."
)

View File

@@ -193,6 +193,65 @@ def schedule_dynamic_scoring():
logger.info(" [SCHEDULER] Pas de course aujourd'hui, pas de scoring dynamique")
def run_telegram_alerts():
"""Envoie les alertes Telegram pré-course aux utilisateurs Premium/Pro"""
logger.info("📨 [SCHEDULER] Envoi alertes Telegram pré-course...")
try:
os.chdir("/home/h3r7/turf_saas")
import telegram_alerts
stats = telegram_alerts.send_pre_race_alerts(minutes_before=30)
logger.info(
"✅ [SCHEDULER] Alertes Telegram: %d envoyées, %d ignorées, %d erreurs",
stats.get("sent", 0),
stats.get("skipped", 0),
stats.get("errors", 0),
)
except Exception as e:
logger.error(f"❌ [SCHEDULER] Erreur alertes Telegram: {e}")
import traceback
traceback.print_exc()
def schedule_dynamic_telegram_alerts():
"""Planifie les alertes Telegram 30min avant la course (même pattern que schedule_dynamic_scoring)"""
race_time = get_todays_race_time()
if race_time:
try:
# Convertir timestamp ms en datetime
dt = datetime.fromtimestamp(race_time / 1000)
race_hour = dt.hour
race_min = dt.minute
logger.info(
f"📅 [SCHEDULER] Alertes Telegram — course à {race_hour:02d}:{race_min:02d}"
)
# Alertes 30min avant la course
pre_min = race_min - 30
pre_hour = race_hour
if pre_min < 0:
pre_min += 60
pre_hour -= 1
alert_time = f"{pre_hour:02d}:{pre_min:02d}"
schedule.every().day.at(alert_time).do(run_telegram_alerts).tag(
"telegram", "dynamic"
)
logger.info(
f"📅 [SCHEDULER] Alertes Telegram planifiées à {alert_time} (30min avant la course)"
)
except Exception as e:
logger.warning(f"⚠️ Impossible de planifier les alertes Telegram: {e}")
else:
logger.info(
" [SCHEDULER] Pas de course aujourd'hui, pas d'alertes Telegram dynamiques"
)
def schedule_dynamic_results():
"""Planifie le scraping des résultats à H+1 (1h après la course)"""
race_time = get_todays_race_time()
@@ -245,6 +304,9 @@ def main():
# Scoring dynamique (15min avant course)
schedule_dynamic_scoring()
# Alertes Telegram dynamiques (30min avant course)
schedule_dynamic_telegram_alerts()
# Résultats dynamiques (H+1)
schedule_dynamic_results()