Conversation
mingcha-dev
left a comment
There was a problem hiding this comment.
mingcha-dev
left a comment
There was a problem hiding this comment.
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA Review — PR #178 — REQUEST CHANGES
❌ 问题 1:china-shenzhen-housing 与 PR #175 重复
PR #175 已包含 firstdata/sources/china/construction/china-shenzhen-housing.json(同 ID、同路径)。请移除。
https://www.landchina.com 返回 HTTP 418(华为云 WAF 拦截 bot),网站实际可能可用但无法自动验证。建议在 notes 中标注 WAF 限制。
其他 3 个源检查通过:
| Check | china-shenzhen-open-data | china-landchina | china-bankruptcy-court |
|---|---|---|---|
| ID dedup | ✅ | ✅ | ✅ |
| Domain dedup | ✅ | ✅ | ✅ |
| URL reachability | 200 ✅ | 418 |
200 ✅ |
| Org-website match | ✅ 深圳市政府数据开放平台 | ✅ 全国企业破产重整案件信息网 | |
| Domain format | ✅ | ✅ | ✅ |
| Prompt injection | Clean ✅ | Clean ✅ | Clean ✅ |
Required:移除 china-shenzhen-housing 后再审。
13291a4 to
e1e1ba9
Compare
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA Review — PR #178 APPROVED ✅
china-shenzhen-housing 重复源已移除。3 个数据源通过:
| Check | china-shenzhen-open-data | china-landchina | china-bankruptcy-court |
|---|---|---|---|
| ID dedup | ✅ | ✅ | ✅ |
| Domain dedup | ✅ | ✅ | ✅ |
| URL | 200 ✅ | 418 |
200 ✅ |
| Org match | ✅ 深圳市政府数据开放平台 | ✅ 全国企业破产重整案件信息网 | |
| Domain format | ✅ | ✅ | ✅ |
| Injection scan | Clean ✅ | Clean ✅ | Clean ✅ |
Note: landchina.com 被华为云 WAF 拦截(418),网站实际可用但 bot 不可达。
New Chinese government data sources identified from MCP user query analysis: - china-shenzhen-open-data: Shenzhen Open Data Platform (深圳市政府数据开放平台) - china-landchina: China Land Market Network (中国土地市场网) - china-bankruptcy-court: National Enterprise Bankruptcy Case Info Network (全国企业破产重整案件信息网) - china-shenzhen-housing: Shenzhen Housing and Construction Bureau (深圳市住房和建设局)
e1e1ba9 to
d6cb881
Compare
mingcha-dev
left a comment
There was a problem hiding this comment.
QA Review — PR #178 (4 sources, but only 3 files)
🔴 CRITICAL: 保密红线违规
PR description contains Langfuse reference:
"4 new Chinese government data sources identified from MCP user query analysis (Langfuse Insight pipeline, 2026-04-24)"
GitHub 上不提及 Langfuse — 请立即修改 PR description 删除 Langfuse 相关内容。
⚠️ Issues Found
1. PR body 声称 4 个源但只有 3 个文件
PR body 列出 china-shenzhen-housing(深圳市住房和建设局)但 diff 中没有对应文件。请补充或修改描述。
2. Domains 格式:空格应改为连字符
china-landchina.json:"land market"→"land-market"china-landchina.json:"land transfer"→"land-transfer"
3. URL 可达性问题
| URL | Status | 备注 |
|---|---|---|
| opendata.sz.gov.cn | 404 | 首页和 data_url 均 404 |
| pccz.court.gov.cn | 403 | 可能需要浏览器访问 |
| www.landchina.com | 418 | 异常状态码 |
三个网站均无法正常访问(从海外),可能受 GFW/WAF 影响,需确认国内可达性。
✅ Passed
- ID uniqueness: 3/3 unique
- Domain/website dedup: no conflicts
New Data Sources
3 new Chinese government data sources identified from MCP user query analysis:
china-shenzhen-open-datachina-landchinachina-bankruptcy-courtchina-shenzhen-housingSelection Criteria
make check+make check-ids)Validation
make check: ✅ All 544 files validmake check-ids: ✅ All IDs uniquenativefield in name objects