Skip to content

Latest commit

 

History

History
585 lines (451 loc) · 26.1 KB

File metadata and controls

585 lines (451 loc) · 26.1 KB

DNS System — Learn by Building in Java

A complete, runnable Java simulation of the Domain Name System.
Every class = one real concept. Every method = one real behaviour.
Read the code → run it → read the README → ace the interview.


How to Run

# 1. Compile  (requires JDK 17+)
compile.bat

# 2. Run
run.bat

No Maven. No Gradle. No dependencies. Pure Java.


Project Map

src/dns/
├── records/          ← All 10 DNS record types (A, AAAA, CNAME, ALIAS, MX, TXT, NS, SOA, PTR, SRV)
├── hierarchy/        ← Root → TLD → Authoritative server chain
├── resolver/         ← Recursive + Iterative resolution engines
├── cache/            ← TTL-based cache (why DNS changes are slow)
├── roundrobin/       ← DNS round-robin load balancing
├── anycast/          ← BGP-nearest-PoP routing + failover
├── dnssec/           ← Real RSA-2048 chain-of-trust validation
├── splithorizon/     ← Internal VPC vs external internet views
└── Main.java         ← Demo runner — runs all 8 sections

Day 1 Curriculum — Concept → Java Class Map

Curriculum Topic Java Class(es)
Root / TLD / Authoritative hierarchy RootNameServer, TLDNameServer, AuthoritativeNameServer
Zone (collection of records) Zone
A record ARecord
AAAA record AAAARecord
CNAME + apex restriction CNAMERecord
ALIAS / ANAME (apex workaround) ALIASRecord
MX record MXRecord
TXT / SPF / DKIM TXTRecord
NS record NSRecord
SOA + negative TTL SOARecord
PTR / reverse DNS PTRRecord
SRV / service discovery SRVRecord
Recursive resolution (8.8.8.8 behaviour) RecursiveResolver
Iterative resolution (referral chain) IterativeResolver
Query / Response protocol types DNSQuery, DNSResponse, RCode
TTL caching + propagation delay TTLCache, CacheEntry
DNS round robin RoundRobinDistributor
Anycast + BGP failover AnycastRouter, PointOfPresence
DNSSEC chain of trust ChainOfTrust, KeySigningKey, ZoneSigningKey
Split-horizon DNS SplitHorizonResolver, ClientContext

Concept Deep Dives


0 · The DNS Hierarchy — Root → TLD → Authoritative

The problem DNS solves: Every device communicates using IPs like 142.250.80.46.
Humans think in names. DNS translates names to IPs — the internet's phone book.

The three levels:

                    . (Root)
                   13 root server clusters (anycast)
                   Managed by IANA / ICANN
                        |
           ┌────────────┼────────────┐
          .com         .org         .io
      (Verisign)      (PIR)      (AFILIAS)
           |
    ┌──────┴──────┐
 google.com    myapp.com
(Authoritative) (Authoritative)
  Google's NS    Your NS (Route 53)

Root servers — 13 logical IPs (A through M), each replicated to hundreds of machines via anycast. They know only: which servers manage each TLD.

TLD servers — Verisign runs .com. When queried for google.com, they return Google's NS names. They don't know Google's IP.

Authoritative servers — the servers YOU control. They hold A, MX, TXT records. Route 53 is a hosted authoritative server.

Java classes:

  • RootNameServer.java — stores TLD delegations (the root zone file)
  • TLDNameServer.java — stores domain→NS delegations for one TLD
  • AuthoritativeNameServer.java — holds zone data, returns AA=true answers
  • Zone.java — the container of all DNS records for one domain

Interview questions:

  • "What are the 3 types of DNS servers?" → Root (delegates to TLD) / TLD (delegates to authoritative) / Authoritative (holds actual records)
  • "Why only 13 root IPs?" → 512-byte UDP limit. They use anycast: same IP, hundreds of machines.
  • "What is delegation?" → TLD pointing to your authoritative NS. The mechanism of DNS scale.

1 · DNS Record Types

A Record — ARecord.java

Maps hostname → IPv4. The most fundamental record.
Multiple A records for the same hostname = DNS round robin.

api.myapp.com.   300   IN   A   54.210.167.99
api.myapp.com.   300   IN   A   54.210.167.100

AAAA Record — AAAARecord.java

Same as A but for IPv6 (128-bit = 4× A = "AAAA").
Dual-stack: a hostname can have both A and AAAA. Clients prefer IPv6 (RFC 8305 Happy Eyeballs).

CNAME — CNAMERecord.java

Maps alias → canonical name. Causes a second DNS lookup.

www.myapp.com.   CNAME   myapp.com.       ← alias to apex
myapp.com.       A       54.210.167.99    ← final answer

⚠️ Apex restriction (RFC 1034 §3.6.2): CNAME cannot exist at the zone apex (myapp.com itself). The apex must have SOA + NS records — a CNAME conflicts with them.
The constructor in CNAMERecord.java enforces this at object-creation time and throws.

ALIAS / ANAME — ALIASRecord.java

Proprietary extension (Route 53 / Cloudflare). Solves the apex problem.
Looks like CNAME semantically, but the provider resolves it server-side and returns a plain A record to the client. No extra round-trips. Works at apex.

The rule:

Static IP available?          → A record
Subdomain pointing to service? → CNAME
Apex pointing to ALB/CDN?    → ALIAS (Route 53 ALIAS / Cloudflare CNAME flatten)

MX Record — MXRecord.java

Routes email. Priority value: lower = higher preference.

myapp.com.  MX  1   aspmx.l.google.com.   ← try first
myapp.com.  MX  5   alt1.aspmx.l.google.com.  ← fallback

Email requires 3 TXT records too: SPF, DKIM, DMARC. Without them → spam folder.

TXT — TXTRecord.java

Arbitrary text. Real uses:

  • SPFv=spf1 include:_spf.google.com ~all (who can send email for you)
  • DKIMv=DKIM1; k=rsa; p=... (public key for email signing)
  • DMARCv=DMARC1; p=reject (what to do with SPF/DKIM failures)
  • Ownershipgoogle-site-verification=... (prove you own the domain)

NS — NSRecord.java

Which name servers are authoritative for this zone. Must have ≥2 (redundancy).
When you "point your domain to Route 53", you set your registrar's NS records to Route 53's.

SOA — SOARecord.java

Zone metadata. Every zone has exactly one, at the apex. Fields:

Field Meaning
mname Primary (master) name server
rname Hostmaster email (@ → .)
serial Zone version — YYYYMMDDNN — increment on every change
refresh How often secondaries poll primary (seconds)
retry Retry interval on failed refresh
expire Secondary stops answering if primary unreachable this long
minimumTtl Negative TTL — how long NXDOMAIN answers are cached

The negative TTL is critical. If someone queries a non-existent name, that "doesn't exist" answer is cached for minimumTtl seconds. Accidentally delete a record → clients get NXDOMAIN cached for potentially hours.
Keep minimumTtl at 60–300 seconds.

PTR — PTRRecord.java

Reverse DNS: IP → hostname. Lives in in-addr.arpa zone.
IP octets are reversed: 54.210.167.9999.167.210.54.in-addr.arpa.

Why reversed? DNS is hierarchical right-to-left. Reversing octets aligns network-portion (hierarchy) on the right with DNS conventions.

Critical for: email deliverability. Mail servers check PTR records. No PTR → spam score goes up → emails rejected.

SRV — SRVRecord.java

Service discovery: hostname + port + priority + weight, all in DNS.
Name format: _service._protocol.domain

_https._tcp.myapp.com.  SRV  10  50  443  server1.myapp.com.
_https._tcp.myapp.com.  SRV  10  50  443  server2.myapp.com.   ← equal weight = equal load
_https._tcp.myapp.com.  SRV  20   0  443  backup.myapp.com.    ← only used if priority-10 all fail

Used by: VoIP (SIP), XMPP, Kubernetes internal discovery, game servers.


2 · TTL & DNS Caching — TTLCache.java + CacheEntry.java

TTL = Time-To-Live. Every DNS record has one (in seconds). It tells caching resolvers: "You may serve this answer for TTL seconds before re-querying me."

Browser cache (OS)  →  Recursive Resolver  →  Authoritative NS
    (TTL seconds)         (TTL seconds)          (source of truth)

Cold query: Browser → Resolver → Root → TLD → Auth → IP returned
Warm query: Browser → Resolver → [HIT] → IP returned instantly

The trade-off:

Low TTL (60s) High TTL (86400s = 1 day)
Fast failover, DNS changes propagate quickly Fewer queries hit your auth server
Higher auth server load Slow propagation — stale IPs persist hours/days
Good for deployments + incident response Good for stable, rarely-changing records

Production migration pattern:

T-48h:  Lower TTL to 60s  (wait for old high-TTL caches to expire)
T-0:    Change the A record to new IP
T+1m:   All resolvers re-query → get new IP within 60s
T+24h:  Raise TTL back to 3600 for performance

COMMON MISTAKE: Changing the IP then lowering the TTL.
Backwards! Lower TTL first, wait for propagation, then change IP.

Negative caching: NXDOMAIN responses are also cached (for SOA minimumTtl seconds). A deleted record means "doesn't exist" is cached everywhere. Keep negative TTL low.


3 · Recursive vs Iterative Resolution

Recursive — RecursiveResolver.java
Client sends ONE question. The resolver does ALL the work. This is what your OS does when calling getaddrinfo().

Client → Resolver: "What is google.com's IP?"
  Resolver → Root:       "Who manages .com?" → "Verisign"
  Resolver → Verisign:   "Who manages google.com?" → "ns1.google.com"
  Resolver → ns1.google: "What is google.com A?" → "142.250.80.46"
Resolver → Client: "142.250.80.46" (cached for TTL)

Iterative — IterativeResolver.java
Client does all the walking. Each server gives only a referral.

Me → Root:       "Who manages .com?" → "Ask Verisign: 192.5.6.30"
Me → Verisign:   "Who manages google.com?" → "Ask ns1.google.com"
Me → ns1.google: "What is google.com A?" → "142.250.80.46"

The nuance interviewers test:
Your device sends a recursive query (RD=1) to 8.8.8.8.
Internally, 8.8.8.8 walks the hierarchy using iterative queries (RD=0).
So "recursive resolution" = recursive from client's view + iterative internally.

DoH / DoT (SDE-3+ depth):
Traditional DNS is plaintext UDP. Your ISP sees every domain you query.

  • DoT (RFC 7858): DNS wrapped in TLS, port 853.
  • DoH (RFC 8484): DNS as HTTPS requests, port 443 — indistinguishable from web traffic.
    Both bypass ISP resolver, centralise to Cloudflare/Google. Privacy vs centralisation debate.

4 · DNS Round Robin — RoundRobinDistributor.java

Multiple A records for one hostname. Resolver rotates which IP is listed first. Client connects to the first IP → different clients hit different servers.

app.mysite.com.   60   IN   A   10.0.0.1   ← Server 1
app.mysite.com.   60   IN   A   10.0.0.2   ← Server 2
app.mysite.com.   60   IN   A   10.0.0.3   ← Server 3

Query 1: [10.0.0.1, 10.0.0.2, 10.0.0.3] → connects to .1
Query 2: [10.0.0.2, 10.0.0.3, 10.0.0.1] → connects to .2
Query 3: [10.0.0.3, 10.0.0.1, 10.0.0.2] → connects to .3

Why production systems DON'T use DNS RR alone:

  1. No health checks — dead servers still get traffic
  2. Uneven distribution — resolver caching breaks uniformity
  3. No session stickiness — reconnecting client may hit different server
  4. Slow failover — DNS update + TTL propagation = minutes of downtime

Where it IS used: Distributing across a fleet of load balancers (not individual servers). dig google.com → multiple IPs → those are Google's front-end LBs, not backend servers.


5 · Anycast — AnycastRouter.java + PointOfPresence.java

One IP, many physical locations. BGP routes each client to the nearest PoP.

Unicast:
  Client in Tokyo  →  Server in Virginia  (200ms)

Anycast (same IP, multiple PoPs):
  Client in Tokyo   →  Tokyo PoP    (8ms)
  Client in London  →  London PoP   (5ms)
  Client in NYC     →  NYC PoP      (7ms)
  All use 1.1.1.1 — the IP is identical!

How it works: Each PoP's router announces the same IP prefix to the internet via BGP. BGP's path-selection algorithm picks the shortest AS-path → nearest PoP wins.

Benefits:

  • Low latency: users reach nearby PoP
  • Automatic failover: PoP withdraws BGP announcement → traffic re-routes in 30–60s
  • DDoS absorption: attack traffic distributed across all PoPs instead of hitting one datacenter

Real example: Cloudflare's 1.1.1.1 is announced from 300+ PoPs. It absorbs 3.8Tbps DDoS attacks.

BGP Hijacking (2008 YouTube incident): Pakistan Telecom accidentally announced a more-specific route for YouTube's IP prefix → global YouTube traffic routed to Pakistan and dropped. This is why RPKI (Route Origin Authorization) was created — cryptographically validates BGP announcements.


6 · DNSSEC — ChainOfTrust.java, KeySigningKey.java, ZoneSigningKey.java

Why: DNS was designed with zero security. Any server that responds to a UDP query with the right transaction ID can poison a resolver's cache. The Kaminsky attack (2008) demonstrated this could redirect entire domains within seconds.

DNSSEC solution: Cryptographic signatures on every DNS record. Resolver verifies signature using a public key → proves the answer came from the legitimate zone owner.

Two-key system:

KSK (Key Signing Key)              ZSK (Zone Signing Key)
─────────────────────────          ──────────────────────────────
Signs: DNSKEY RRSet only           Signs: All other records (A, MX, etc.)
Rotated: Every 1-2 years           Rotated: Every 90 days
Key size: 2048-bit RSA             Key size: 2048-bit RSA
Parent involvement: YES (DS rec)   Parent involvement: NO

Why two keys? ZSK rotates frequently. If ZSK signed itself, you'd need the parent to update its DS record every 90 days — complex and error-prone. KSK signs the ZSK (via DNSKEY RRSet), so parent DS only needs updating when KSK changes (rare).

Chain of trust (validated bottom-up):

Root KSK  (hardcoded "trust anchor" in every DNSSEC resolver)
    │
    └── Signs root DNSKEY → verifies .com DS record
    
.com KSK
    │
    └── Signs .com DNSKEY → verifies myapp.com DS record
    
myapp.com KSK   ← KeySigningKey.java
    │
    └── Signs DNSKEY RRSet → authenticates ZSK
    
myapp.com ZSK   ← ZoneSigningKey.java
    │
    └── Signs all records → RRSIG on every A, MX, TXT record

Validation results:

  • SECURE — full chain verified, record is authentic
  • INSECURE — zone is not DNSSEC-signed (most zones today)
  • BOGUS — chain is broken → resolver returns SERVFAIL (NOT NXDOMAIN)

The operational risk: DNSSEC key rollover can make your domain unreachable. If the parent's DS record points to a stale KSK after rotation → BOGUS → SERVFAIL for all users. This is why only ~20% of domains use DNSSEC, despite the security benefit.

This project uses real RSA-2048 via java.security — not mock/fake crypto. The ChainOfTrust.validate() method performs an actual cryptographic signature + verification.


7 · Split-Horizon DNS — SplitHorizonResolver.java + ClientContext.java

Same hostname → different answers depending on where the query comes from.

Query: api.myapp.com

Internal client (VPN / inside VPC):
  → Returns 10.0.1.50  (internal load balancer)
  → Traffic stays inside VPC — no NAT, lower latency, free egress

External client (public internet):
  → Returns 54.210.167.99  (public-facing ALB)
  → Traffic enters through public entry point

AWS Route 53 implementation:

  • Create a Private Hosted Zone for myapp.com, associated with your VPC
  • Add internal records: api.myapp.com → 10.0.1.50
  • Route 53 serves the private zone to queries from within the associated VPC
  • Public Zone api.myapp.com → 54.210.167.99 is served to everyone else

Benefits:

  • Eliminates NAT Gateway costs (internal→internal: free)
  • Better security (internal services unreachable from internet)
  • Consistent naming (same hostname everywhere, correct IP always)
  • Lower latency (VPC-internal path vs public internet round-trip)

Kubernetes CoreDNS:
CoreDNS implements split-horizon for all pods:

  • *.svc.cluster.local → resolved from Kubernetes service registry (internal ClusterIP)
  • Everything else → forwarded to upstream resolver (external DNS)

Interview Cheat Sheet

Easy (expect these in any interview)

Question Answer
What is DNS? Translates hostnames to IPs. Hierarchical: root → TLD → authoritative
What is a TTL? How long resolvers cache a record. High = fast but slow propagation. Low = slow but fast changes
CNAME vs A record? A = hostname → IP. CNAME = alias name → another name (then resolver looks up that name)
Why can't CNAME be at apex? RFC 1034: apex must have SOA+NS records; CNAME can't coexist with other record types
What is NXDOMAIN? Response when a name doesn't exist in DNS. Also cached (negative TTL).
What is recursive resolution? Client asks resolver, resolver does all the work, returns final answer
What is iterative resolution? Each server returns only the next referral; client walks the chain itself

Medium (system design rounds)

Question Key Points
How does DNS failover work? Update A record → wait for TTL. Low TTL (60s) needed beforehand for fast failover
DNS round robin limitations? No health checks, no stickiness, slow failover, uneven distribution
How does anycast work? Same IP, multiple PoPs. BGP routes to nearest. Failover = BGP withdrawal
What is split-horizon? Different answers based on client network. Internal=private IP, external=public IP
How to point apex domain to ALB? Can't use A (ALB has no static IP). Can't use CNAME (apex restriction). Use Route 53 ALIAS record

Hard (SDE-3+ rounds)

Question Key Points
DNS failover limitations? TTL still cached even after update. Many resolvers ignore min TTL. Negative caching. Non-RFC-compliant resolvers
How does DNSSEC work? RRSIG on every record, verified via ZSK. ZSK authenticated by KSK. KSK hash in parent DS record. Root trust anchor hardcoded
KSK vs ZSK why two keys? ZSK rotates frequently (90d) — no parent coordination needed. KSK changes rarely — coordinates with parent via DS record
Production DNSSEC risk? Failed key rollover → BOGUS → SERVFAIL for all validating resolvers. Domain unreachable until fixed
Kaminsky attack? Flood resolver with forged UDP responses for a query. With right transaction ID, poison cache. DNSSEC prevents this
How does CoreDNS handle Kubernetes service discovery? Authoritative for *.cluster.local (from k8s API). Forwards everything else to VPC DNS. Enables split-horizon without config on pods

Low-Level Design (LLD) — What to Say When Asked

If asked "design the DNS system" in an LLD interview, walk through these components:

  1. Zone store — key-value store of hostname:type → list of records. Append-only with versioning (serial number in SOA).

  2. Recursive resolver — cache-first lookup, then iterative walk (root → TLD → auth). Uses a TTL-based eviction cache (similar to TTLCache.java).

  3. CacheConcurrentHashMap<String, List<CacheEntry>>. Key = hostname:type. Entries auto-expire on read (lazy eviction). Background thread for periodic sweep.

  4. High availability — multiple authoritative NS servers (ns1, ns2) with zone transfers (AXFR). Recursive resolvers are stateless + horizontally scalable.

  5. Split-horizon — two DNS views: private (VPC clients) + public. Implemented via resolver policy: check private zone first for internal IPs.

  6. Round-robinAtomicInteger counter in the zone server. Each query gets the list starting from index counter % size. Thread-safe rotation.

  7. Security — DNSSEC for data integrity. DoH/DoT for transport privacy. RPKI for BGP route origin validation (prevents anycast hijacking).


File Structure

DNS system/
├── compile.bat              ← Windows: compiles all Java source files
├── run.bat                  ← Windows: runs dns.Main
├── README.md                ← You are here
└── src/
    └── dns/
        ├── Main.java                          ← Demo runner (all 8 sections)
        ├── records/
        │   ├── RecordType.java                ← Enum: A, AAAA, CNAME, ALIAS, MX, TXT, NS, SOA, PTR, SRV
        │   ├── DNSRecord.java                 ← Abstract base (name, type, ttl, createdAt)
        │   ├── ARecord.java                   ← IPv4 mapping + validation
        │   ├── AAAARecord.java                ← IPv6 mapping
        │   ├── CNAMERecord.java               ← Alias + apex restriction enforcement
        │   ├── ALIASRecord.java               ← Apex workaround, server-side resolution
        │   ├── MXRecord.java                  ← Mail routing with priority
        │   ├── TXTRecord.java                 ← SPF / DKIM / DMARC / verification
        │   ├── NSRecord.java                  ← Zone delegation
        │   ├── SOARecord.java                 ← Zone metadata + negative TTL
        │   ├── PTRRecord.java                 ← Reverse DNS + reverseIP() utility
        │   └── SRVRecord.java                 ← Service discovery with port+weight
        ├── hierarchy/
        │   ├── Zone.java                      ← Container for all records of one domain
        │   ├── AuthoritativeNameServer.java    ← Returns AA=true answers from zone data
        │   ├── TLDNameServer.java             ← Stores domain→NS delegations per TLD
        │   └── RootNameServer.java            ← Stores TLD delegations (root zone file)
        ├── resolver/
        │   ├── RCode.java                     ← NOERROR, NXDOMAIN, SERVFAIL, REFUSED
        │   ├── DNSQuery.java                  ← Encapsulates client question (name, type, RD bit)
        │   ├── DNSResponse.java               ← Answer, authority, additional sections + AA bit
        │   ├── RecursiveResolver.java         ← Full-service resolver with TTL cache
        │   └── IterativeResolver.java         ← Explicit hop-by-hop referral chain
        ├── cache/
        │   ├── CacheEntry.java                ← One cached record + expiry + remainingTtl()
        │   └── TTLCache.java                  ← ConcurrentHashMap cache, hit/miss stats
        ├── roundrobin/
        │   └── RoundRobinDistributor.java     ← AtomicInteger rotation across A records
        ├── anycast/
        │   ├── PointOfPresence.java           ← BGP announcement + offline simulation
        │   └── AnycastRouter.java             ← Routes to nearest online PoP by latency
        ├── dnssec/
        │   ├── ZoneSigningKey.java            ← Real RSA-2048, signs all zone records
        │   ├── KeySigningKey.java             ← Real RSA-2048, signs DNSKEY RRSet + DS record
        │   └── ChainOfTrust.java              ← Full KSK→ZSK→RRSIG validation chain
        └── splithorizon/
            ├── ClientContext.java             ← Internal vs external client detection
            └── SplitHorizonResolver.java      ← Private zone (VPC) vs public zone views

Expected Output (what run.bat prints)

╔══════════════════════════════════════════════════════════════════╗
║  SECTION 0: DNS HIERARCHY — Root → TLD → Authoritative          ║
╚══════════════════════════════════════════════════════════════════╝
[AuthNS:ns1.google.com] Loaded zone: google.com
[AuthNS:ns-1.awsdns-00.com] Loaded zone: myapp.com
...

╔══════════════════════════════════════════════════════════════════╗
║  SECTION 1: RECURSIVE RESOLUTION (mimics Google 8.8.8.8)        ║
╚══════════════════════════════════════════════════════════════════╝
┌─ [RECURSIVE RESOLVER] QUERY A   google.com  (RD=true, from=127.0.0.1)
│  ✗ Cache MISS — beginning iterative walk up the hierarchy
│  [1/3] → Root Server: 'Who manages .com?'
│       ✓ Root: '.com is managed by Verisign'
│  [2/3] → TLD (Verisign): 'Who manages google.com?'
│       ✓ TLD: 'Authoritative NS for google.com → [ns1.google.com]'
│  [3/3] → Auth NS (ns1.google.com): 'google.com A?'
│       ✓ Auth: Got 2 record(s) → cached for 300s
└─────────────────────────────────────────────────────
  ;; NOERROR(0) [from ns1.google.com, AA]
  ;; ANSWER SECTION:
  google.com                          300s  IN  A       142.250.80.46
  google.com                          300s  IN  A       142.250.80.68

┌─ [RECURSIVE RESOLVER] QUERY A   google.com  (RD=true, from=127.0.0.1)
│  ✓ Cache HIT — serving from cache (no network needed)
└─────────────────────────────────────────────────────
...

Built for learning. Every line of code is a concept, not just implementation.