Skip to content

Commit b6b169c

Browse files
committed
include likely country of origin as abuse signal
1 parent cd2716c commit b6b169c

2 files changed

Lines changed: 34 additions & 0 deletions

File tree

web/src/server/free-session/abuse-detection.ts

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,38 @@ export async function identifyBotSuspects(params: {
271271
score += 15
272272
}
273273

274+
// --- Region signal (corroborating, scored only when stacked with usage) ---
275+
// The free tier is intended for users in approved regions: English-speaking
276+
// (US, UK, Canada, Australia, NZ, Ireland) and western-European markets.
277+
// We have no IP data, so region is inferred from email provider and the
278+
// unicode characters in the display name. CJK indicators (Chinese/Japanese/
279+
// Korean Unicode in name, Chinese-provider emails, .edu.cn domains) are
280+
// the only signal we can detect reliably, and empirically our abuse
281+
// clusters are overwhelmingly from these provider pools. Diaspora users
282+
// from approved regions may trip this flag, so it only contributes to the
283+
// score when combined with heavy usage (the combination, not the region
284+
// alone, is what justifies the score bump).
285+
const hasCjkName =
286+
!!s.name &&
287+
/[-鿿--]/.test(s.name)
288+
const hasChineseDomain =
289+
!!s.email &&
290+
/@(qq|163|126|sina|sina\.cn|foxmail|aliyun|139|yeah|tom)\.(com|cn|net)$/i.test(
291+
s.email,
292+
)
293+
const hasCnEduDomain = !!s.email && /\.edu\.cn$/i.test(s.email)
294+
const nonApprovedRegion =
295+
hasCjkName || hasChineseDomain || hasCnEduDomain
296+
if (nonApprovedRegion) {
297+
const reasons: string[] = []
298+
if (hasCjkName) reasons.push('cjk-name')
299+
if (hasChineseDomain) reasons.push('cn-provider')
300+
if (hasCnEduDomain) reasons.push('cn-edu')
301+
flags.push(`non-approved-region[${reasons.join(',')}]`)
302+
if (msgs24h >= 500) score += 40
303+
else if (msgs24h >= 300) score += 25
304+
}
305+
274306
// --- Email/handle pattern flags (purely informational) ---
275307
// These are too noisy in isolation (many real users have digits in their
276308
// email, use plus-aliases for privacy, or sign up via duck.com). They're

web/src/server/free-session/abuse-review.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ A very young GitHub account (gh_age < 7d, especially < 1d) combined with heavy u
5050
5151
Conversely, a GitHub account older than ~30 days is meaningful counter-evidence. The "day-1 of coding = day-1 of GitHub" pattern that makes fresh-GH such a strong bot signal doesn't apply once the GH predates the codebuff account by a month or more. gh_age ≥ 30d + a moderate quiet gap (≥4h) + any agent diversity reads like an excited power user, not a bot. Don't tier these as HIGH unless there's a genuinely unambiguous per-account signal (true near-continuous activity, see below).
5252
53+
The free tier is intended for users in approved regions: English-speaking (US, UK, Canada, Australia, NZ, Ireland) and western-European markets. We have no IP geolocation, so region is inferred heuristically — the \`non-approved-region[...]\` flag fires when the account has a CJK-character display name (\`cjk-name\`), a Chinese email provider (\`cn-provider\` — qq.com, 163.com, 126.com, sina.com, foxmail.com, aliyun.com, 139.com, yeah.net, tom.com), or a \`.edu.cn\` domain (\`cn-edu\`). Empirically our abuse clusters are overwhelmingly from these provider pools, and heavy free-tier usage from them strongly correlates with VPN-based farming. BUT real diaspora developers from approved regions exist and trip this flag too. So: region alone is NEVER grounds for a ban. Treat it as corroborating evidence that RAISES confidence when stacked with heavy usage (msgs_24h ≥ 300) or other bot signals — a \`non-approved-region\` user with \`very-heavy\` usage on a young account is TIER 1; the same user with established-GH + low usage + diverse-agents stays in TIER 2.
54+
5355
Creation-cluster membership is a WEAK signal on its own. The detector is purely temporal — accounts created within 30 minutes of each other. At normal signup volume, unrelated real users routinely land in the same window (product launches, HN/Reddit posts, timezone-aligned bursts). A cluster is only actionable when its members share a concrete cross-account pattern: matching email-local stems or digit siblings (\`v6apiworker\` / \`v8apiworker\`), a shared uncommon domain (\`@mail.hnust.edu.cn\`), sequential-number naming, or near-identical msgs_24h / distinct_hours footprints across multiple members. Absent such a shared pattern, treat a cluster list as background noise and tier members purely on their per-account signals. When you do use a cluster as evidence, name the shared pattern explicitly — "cluster sharing the \`vNNapiworker\` stem", not "member of 5-account creation cluster".
5456
5557
Produce a markdown report with two sections:

0 commit comments

Comments
 (0)