[10주차] Batch 활용 주간 월간 랭킹 시스템 - 최호석 by ghtjr410 · Pull Request #418 · Loopers-dev-lab/loop-pack-be-l2-vol3-java

ghtjr410 · 2026-04-17T08:07:51Z

이번 과제는 "배치란 무엇인가" 그 자체를 분해하고 정의를 내리는 데 집중했습니다.
구현 레벨(Reader/Processor/Writer를 어떻게 조립하는가)이 아니라, 배치가 올바르게 작동하려면 어떤 경계를 세워야 하고, 그 경계들이 서로 어떻게 물려 있는지를 먼저 잡았습니다.
이 멘탈 모델이 있으면 나중에 다른 배치를 만들 때도 "경계 세 층위가 다 있나? 5대 속성을 깨뜨리는 결정은 아닌가?"로 판단할 수 있습니다.
라이브러리 사용법은 금방 찾지만, 왜 이렇게 써야 하는지의 시야는 한 번 만들어놓지 않으면 매번 처음부터 시작한다고 생각하였습니다.

무엇: @Scheduled 기반 캘린더 랭킹 → Spring Batch 기반 롤링(어제 기준 7/30일) 랭킹으로 전환
왜: 캘린더 경계 편향 + restart/멱등성 부재 + streamer 프로세스 혼재 해소
어떻게: commerce-batch 프로세스 분리, 7-Step Job, MV 원자 교체 + 전일 fallback
측정: S/M/L/XL 네 단계(15만~100만 row)에서 정상 완주 확인, 100만 row 9.8초

graph TB
    subgraph "commerce-api (발행 + 조회)"
        사용자행동["사용자 행동<br/>조회/좋아요/주문"]
        아웃박스[("outbox_events")]
        카프카발행["Kafka 발행"]

        사용자행동 --> 아웃박스
        아웃박스 -->|"커밋 후 비동기"| 카프카발행

        랭킹API["GET /api/v1/rankings<br/>period=LAST_7D / LAST_30D"]
        로컬캐시["Caffeine 30초"]
        랭킹API --> 로컬캐시
        로컬캐시 -->|"miss"| 레디스ZSET
        로컬캐시 -->|"Redis miss"| MV테이블
    end

    subgraph "Kafka"
        뷰토픽["product-view-events"]
        인터랙션토픽["product-interaction-events"]
        주문토픽["order-events"]
    end

    카프카발행 --> 뷰토픽
    카프카발행 --> 인터랙션토픽
    카프카발행 --> 주문토픽

    subgraph "commerce-streamer (이벤트 수집)"
        뷰컨슈머["View 컨슈머"]
        인터랙션컨슈머["Interaction 컨슈머"]
        주문컨슈머["Order 컨슈머"]
        원천DB[("MySQL 원천<br/>view/like/order<br/>bucket 테이블")]

        뷰컨슈머 -->|"buffer → flush → reconcile"| 원천DB
        인터랙션컨슈머 -->|"건당 즉시"| 원천DB
        주문컨슈머 -->|"건당 즉시"| 원천DB
    end

    뷰토픽 --> 뷰컨슈머
    인터랙션토픽 --> 인터랙션컨슈머
    주문토픽 --> 주문컨슈머

    subgraph "commerce-batch (Round 10 신규)"
        배치Job["rollingRankingJob<br/>anchorDate=yyyyMMdd<br/>매일 01:00 KST"]
        스테이징[("staging 테이블<br/>1차 집계 + 2차 score")]
        가중치설정[("ranking_weight_config<br/>ExecutionContext 스냅샷")]
        MV테이블[("MV 테이블<br/>mv_product_rank_last_7d<br/>mv_product_rank_last_30d")]
        이력로그[("batch_audit_log<br/>실행 이력")]

        가중치설정 --> 배치Job
        배치Job -->|"Step 1 ~ 3: cursor 스트리밍 집계"| 원천DB
        배치Job -->|"Step 1 ~ 3 → 스테이징 적재"| 스테이징
        스테이징 -->|"Step 4: score 계산"| 스테이징
        스테이징 -->|"Step 5: DELETE+INSERT 원자 교체<br/>+ 실행 이력 기록"| MV테이블
        MV테이블 -.->|"같은 TX"| 이력로그
        MV테이블 -->|"Step 6: shadow RENAME"| 레디스ZSET[("Redis ZSET<br/>ranking:last7d/last30d")]
    end

1. 왜 (Why)

기존 WeeklyRankingRefresher / MonthlyRankingRefresher는 @Scheduled로 5분/30분마다 캘린더 주(월 ~ 일) / 캘린더 월(1일 ~ 말일)을 재집계한다. 이 구조에 세 가지 문제가 있었다.

캘린더 경계 편향 - 월요일 오전/매월 1일 오전에 표본이 빈약하여 랭킹이 흔들린다.
restart·멱등성 부재 - 중간에 죽으면 처음부터 다시 돌려야 하고, 같은 입력에 같은 결과가 나온다는 보장이 없다.
프로세스 혼재 - streamer(consumer)와 같은 프로세스에서 무거운 집계가 돌면서 lag 위험을 안고 있다.

근본적으로, 이 구조는 경계 세 층위(선택/처리/완료) 중 어느 것도 명시적이지 않다. 트리거와 처리가 같은 클래스·같은 프로세스·같은 스레드에 있으면서, "어디까지 처리했는가"에 대한 기록도 없다. @Scheduled로 주기 호출만 있으면 그건 복잡해진 스케줄러지 배치가 아니다.

업계에서 이커머스 랭킹에 일반적으로 사용하는 방식인 **"어제까지 N일 롤링 + 한 번 만들어 하루 서빙"**을 선택했고, commerce-batch 프로세스 분리 + Spring Batch 기반 재시작·멱등성·관측성을 구현하기로 결정했다.

2. 설계 철학

이 PR의 모든 세부 결정은 "배치란 경계를 정해서 모은 뒤, 일괄 처리한다" 라는 관점에서 출발한다.

경계 3층위

배치 설계는 결국 세 층위의 경계를 정하는 일이다. 세 층위가 모두 명시적으로 정의되지 않으면 배치는 취약해진다.

층위	질문	본 설계의 구현
선택 경계	무엇을 모을 것인가	`anchorDate` 기반 롤링 7일/30일 + `bucket_time` exclusive 범위
처리 경계	한 번에 얼마나 물 것인가	Chunk 500 + JDBC batch + 단일 스레드
완료 경계	어디까지 처리했는가	JobRepository 메타 + staging UPSERT + MV 원자 교체

세 경계는 서로를 지탱한다 - 선택 경계가 흐릿하면 완료 경계도 흐릿해지고(어디까지가 "끝"인지 모름), 처리 경계가 무너지면 완료 경계가 의미를 잃는다(한 덩어리가 터지면 전부 돌아가야 함).

5대 속성 - 배치를 올바르게 사용하기 위한 판단 프레임워크

배치 프레임워크를 도입하는 것 자체가 목적이 아니라, 배치답게 작동하는지를 검증할 수 있어야 한다. 이를 위해 5가지 속성을 정의하고, 모든 설계 결정이 이 속성을 위반하지 않는지 판단 기준으로 사용했다.

속성	의미	본 설계의 검증
Unattended	새벽 3시에 실패해도 oncall이 로그만 보고 복구할 수 있다	Step별 구조화된 로그 + audit_log에 적재 건수 기록 (MV INSERT와 같은 TX). 실패 시 같은 `anchorDate`로 재실행하면 복구 완료
Bounded	입력 범위가 실행 시점과 무관하게 확정된다	`anchorDate` JobParameter + weight_config ExecutionContext 스냅샷으로 외부 입력 전량 동결
Bulk	건당 I/O가 0이다 (모든 작업이 chunk/batch 단위)	Chunk 500 + JDBC batchUpdate + cursor streaming. Chunk 500은 DB 풀 점유 시간과 commit 빈도의 균형점으로 실측 조정한 값
SLA-bound	명시적 시간 한도가 있고, 그 안에 끝난다는 근거가 있다	01:00 시작 → 04:00 마감. App streaming 집계로 DB 실행 계획 변동 위험 회피. 현재 규모에서는 단일 스레드로 충분
Resilient	같은 파라미터로 2번 실행하면 같은 결과가 나온다. 중간에 죽었다 살아나도 최종 결과가 같다	Step 0 DELETE + Step 5 사전 DELETE + staging UPSERT → 비즈니스 멱등성. JobRepository restart + 전체 재실행 이중 안전장치

이 속성들은 독립이 아니라 서로를 전제한다 - Bounded가 깨지면 Resilient가 깨지고, Bulk이 깨지면 SLA-bound가 깨지고, Resilient가 깨지면 Unattended가 의미 없다. 이후의 모든 결정(App streaming, lookahead 직렬화, MV 원자 교체 등)은 이 5대 속성 중 하나 이상을 지키기 위한 것이다.

경계 3층위가 5대 속성을 구현하는 수단이다:

경계	만들어내는 속성
선택 경계 (anchorDate + 윈도우 동결)	→ Bounded
처리 경계 (Chunk 500 + 단일 스레드)	→ Bulk + SLA-bound
완료 경계 (JobRepository + MV 원자 교체)	→ Resilient + Unattended

트리거 시간 ≠ 데이터 경계

배치에서 "시간"은 두 가지 다른 역할로 등장한다.

시간의 역할	의미	본 설계의 예
트리거로서의 시간	언제 실행을 시작할까	매일 01:00 KST (Cron)
데이터 경계로서의 시간	어느 범위의 데이터를 다룰까	`anchorDate` = 어제 / `bucket_time` 범위

둘은 독립적으로 변할 수 있어야 한다 - 같은 anchorDate를 새벽 3시에 돌려도, 수동으로 낮에 돌려도 같은 결과가 나와야 한다. 이 독립성을 깨뜨리는 순간(예: LocalDate.now()를 Reader가 호출) 배치는 트리거 시간에 종속된 비결정 연산이 되고, 재시작·백필·재현 가능성이 무너진다. 본 설계가 anchorDate를 외부 JobParameter로 받고 NOW() 사용을 금지하는 이유다.

배치 멱등성은 API보다 강한 조건

	멱등성의 의미
API	같은 요청을 N번 보내도 최종 state가 같다
배치	중간에 죽었다 살아나도 최종 결과가 같다

이 더 강한 조건을 만족시키려면: 완료 경계가 지속 상태로 기록되어야 하고(JobRepository, staging UPSERT), 모든 중간 산출물이 재생성 가능해야 하며(Step 0 DELETE, Step 5 사전 DELETE), 외부 시간·외부 상태에 의존하지 않아야 한다(anchorDate로 입력 고정, NOW() 금지). 본 설계의 이중 안전장치(JobRepository restart + 비즈니스 멱등성)는 이 강한 조건을 두 층으로 방어하는 구조.

Immutable Input - restart가 유효한 전제

일반 배치에서 Spring Batch restart는 사실상 없는 기능이나 다름없다. restart가 유효하려면 같은 input, 같은 ordering, 같은 기준 조건이 모두 성립해야 하는데, 원본 데이터가 UPDATE/INSERT/DELETE되면 cursor 위치의 의미가 바뀐다. 본 설계의 입력은 immutable하다 - anchorDate = 어제이므로 모든 bucket은 이미 과거이고, reconciler(Redis Hash → MySQL로 5분마다 영속화하는 컴포넌트)의 덮어쓰기 UPSERT 원칙상 과거 bucket은 불변. 이 전제가 깨지면 restart 1차 안전장치(JobRepository 기반 chunk 이어서 실행)는 무의미해지고, 2차 안전장치(Step 0 DELETE + Step 5 사전 DELETE로 전체 재실행해도 같은 결과)만으로 운영해야 한다.

아토믹 스왑 - 배치가 필연적인 시나리오

score 공식이 클릭 × 1.5 → 클릭 × 2.4로 변경될 때, 기존 랭킹 1000개 중 999개는 옛 공식으로 계산되어 있다. 새 공식으로 전부가 동시에 갱신되어야 순위가 의미 있다. 스트리밍으로는 "어느 시점부터 새 공식 적용"의 끝점을 알 수 없고, 파티션 병렬 처리에서 순서 보장도 안 된다. 배치는 경계(anchorDate)로 입력을 동결하고, 새 공식으로 전체를 한 번에 계산하며, 정의된 상태(빈 MV → 확정된 TOP 100)만 통과시킨다. 본 설계의 Step 5(DELETE+INSERT 원자 교체)가 이 아토믹 스왑의 직접 구현이다.

3. 파이프라인 전체 그림

데이터 Tier 계층

Tier	저장소	역할	특성
Tier 0	Kafka 이벤트	원천 (append-only)	streamer가 수집
Tier 1	bucket 테이블 (view/like/order)	5분 단위 시계열 집계	배치 입력의 시작점
Tier 2	staging 테이블 (1차 합계 + 2차 score)	배치 내부 중간 산출물	Step 0에서 매 실행 초기화
Tier 3	MV 테이블 (last_7d / last_30d)	확정된 TOP 100, 디스크 영속	Step 5에서 원자 교체
Tier 4	Redis ZSET	Tier 3의 identity cache	휘발, TTL 3일

골든 룰: 파생에서 파생하지 않는다 - 모든 Tier는 원천(Tier 0 ~ 1) 또는 직전 Tier에서만 만들어진다.

Step 구조

rollingRankingJob (anchorDate=yyyyMMdd)

 ─── 선택 경계 ────────────────────────────────────────
  Step 0: truncateStagingStep       Tasklet   해당 anchor 스테이징 DELETE (멱등 보장)

 ─── 처리 경계 ────────────────────────────────────────
  Step 1: stageViewMetricsStep      Chunk     cursor + app streaming aggregator
  Step 2: stageLikeMetricsStep      Chunk     cursor → 1차 스테이징
  Step 3: stageOrderMetricsStep     Chunk     cursor → 1차 스테이징
  Step 4: scoreAggregationStep      Chunk     전 상품 score 계산, weight_group fan-out → 2차 스테이징

 ─── 완료 경계 ────────────────────────────────────────
  Step 5: promoteTopToMvStep        Tasklet   DELETE + INSERT...SELECT TOP 100 + 실행 이력 기록 (단일 TX)
  Step 6: redisRefreshStep          Tasklet   MV → Redis ZSET identity cache (shadow RENAME)

배치 시퀀스 ① 선택 경계 - 입력 동결 + 초기화

sequenceDiagram
    autonumber
    participant Job as 배치 Job 시작
    participant 가중치DB as 가중치 설정 DB
    participant 컨텍스트 as 실행 컨텍스트 (스냅샷 저장소)
    participant 스테이징 as 스테이징 테이블

    Job->>Job: anchorDate 파싱 → 롤링 7일/30일 윈도우 계산
    Job->>가중치DB: 활성 가중치 그룹 전량 조회
    가중치DB-->>Job: [control, experiment, ...]
    Job->>컨텍스트: anchorDate + 윈도우 범위 + 가중치 전량 JSON 저장
    Note over Job,컨텍스트: 이 시점 이후 모든 Step은 DB를 재조회하지 않고<br/>스냅샷만 참조 (외부 입력 완전 동결)

    Job->>스테이징: 해당 anchor의 스테이징 데이터 DELETE
    Note over 스테이징: 재실행 시에도 깨끗한 상태에서 시작 (멱등 보장)

배치 시퀀스 ② 처리 경계 - 메트릭 집계 + 점수 계산

sequenceDiagram
    autonumber
    participant 원천 as 원천 bucket 테이블<br/>(view/like/order)
    participant 집계기 as App 스트리밍 집계기
    participant 스테이징1 as 1차 스테이징<br/>(상품별 합계)
    participant 점수계산 as 점수 계산기
    participant 스테이징2 as 2차 스테이징<br/>(상품별 점수)

    Note over 원천,스테이징1: Step 1 ~ 3: 조회수/좋아요/주문 각각 집계
    원천->>집계기: cursor scan (상품ID·시간 순)
    Note over 집계기: 상품이 바뀌는 경계를 감지하여<br/>한 상품의 모든 bucket을 합산 후 emit<br/>(메모리: 상품 1개분만 유지)
    집계기->>집계기: 7일/30일 구간을 나눠 각각 합계
    집계기->>스테이징1: 상품별 (조회수합계, 좋아요합계, 주문합계) 적재
    Note over 집계기,스테이징1: 500건 단위 chunk commit마다<br/>다음 상품의 미리 읽은 row를 스냅샷 저장<br/>(restart 시 이어서 처리 가능)

    Note over 스테이징1,스테이징2: Step 4: 가중합 점수 계산
    스테이징1->>점수계산: 상품별 합계 읽기
    loop 각 가중치 그룹 (A/B 테스트)
        점수계산->>점수계산: 가중합 점수 계산
        점수계산->>스테이징2: 그룹별 상품 점수 적재
    end

배치 시퀀스 ③ 완료 경계 - MV 교체 + 이력 기록 + 캐시 전파

sequenceDiagram
    autonumber
    participant 스테이징 as 2차 스테이징<br/>(상품별 점수)
    participant MV as 랭킹 MV 테이블
    participant 이력 as 실행 이력 (audit_log)
    participant 레디스 as Redis ZSET

    Note over 스테이징,이력: Step 5: MV 원자 교체 + 이력 기록 (단일 트랜잭션)
    MV->>MV: 해당 날짜의 기존 MV 삭제
    스테이징->>MV: 점수 상위 100개를 순위 매겨서 INSERT
    MV->>이력: 적재 건수 기록 (같은 TX)
    Note over MV: DELETE + INSERT + 이력 기록이 하나의 트랜잭션<br/>→ 외부에서는 "이전 데이터" 또는 "새 TOP 100"만 보임<br/>→ 빈 MV가 노출되는 순간 없음

    Note over MV,레디스: Step 6: Redis 캐시 전파
    MV->>레디스: 임시 키에 TOP 100 ZADD
    레디스->>레디스: 임시 키 → 본 키로 RENAME (원자적 교체)
    레디스->>레디스: TTL 3일 설정

조회 경로 - LAST_7D/LAST_30D (Round 10 변경)

week9에서 WEEKLY/MONTHLY(캘린더)를 LAST_7D/LAST_30D(롤링)로 교체했다. fallback도 bucket SUM 재계산 → MV 직접 조회 + 전일 anchor retry로 변경.

정상 케이스 - Redis 히트

sequenceDiagram
    autonumber
    participant 사용자
    participant API as 랭킹 API
    participant 캐시 as 로컬 캐시 (30초)
    participant 레디스 as Redis ZSET
    participant 상품DB as 상품 DB

    사용자->>API: GET /api/v1/rankings?period=LAST_7D&size=20
    API->>API: userId로 A/B 그룹 판별 → "control"
    API->>캐시: 캐시 조회 (기간+날짜+페이지+그룹)
    alt 캐시 히트
        캐시-->>API: 완성된 응답 반환 (Redis 0회, DB 0회)
    else 캐시 미스
        API->>레디스: ZREVRANGEWITHSCORES ranking:last7d:{어제날짜}:{그룹}
        레디스-->>API: 상품ID + 점수 목록
        API->>상품DB: 상품 정보 배치 조회
        API->>API: 삭제 상품 필터 + size 제한
        API->>캐시: 완성된 응답 저장
    end
    API-->>사용자: 200 OK

fallback 케이스 - Redis miss → MV 조회 → 전일 retry

sequenceDiagram
    autonumber
    participant 사용자
    participant API as 랭킹 API
    participant 레디스 as Redis ZSET
    participant MV as 랭킹 MV 테이블
    participant 상품DB as 상품 DB

    사용자->>API: GET /api/v1/rankings?period=LAST_7D&size=20
    API->>레디스: ZREVRANGEWITHSCORES
    레디스-->>API: 빈 결과 (miss 또는 장애)

    Note over API,MV: MV fallback - 확정된 TOP 100을 그대로 조회 (재계산 아님)
    API->>MV: SELECT WHERE anchor_date = 4/16 AND weight_group = 'control'
    MV-->>API: 빈 결과 (배치 미실행)

    Note over API,MV: 전일 anchor retry - 어제 랭킹이라도 보여주기 (A/B 그룹 유지)
    API->>MV: SELECT WHERE anchor_date = 4/15 AND weight_group = 'control'
    MV-->>API: TOP 100 반환
    Note over API: control 유저는 전일 control MV로 서빙<br/>experiment 유저는 전일 experiment MV로 서빙<br/>A/B 그룹 일관성 유지

    API->>상품DB: 상품 정보 배치 조회
    API-->>사용자: 200 OK (전일 랭킹)

week9 대비 변경점:

항목	week9	week10
기간 타입	`WEEKLY` / `MONTHLY` (캘린더)	`LAST_7D` / `LAST_30D` (롤링, 오늘 제외)
데이터 소스	Redis ZSET (스케줄러 빌드)	Redis ZSET (배치 빌드) → MV fallback
fallback	bucket 테이블 SUM 재계산	MV 직접 조회 (이미 확정된 TOP 100, 재계산 아님)
fallback 실패 시	빈 결과	전일 anchor 자동 retry (최대 3일, A/B 그룹 유지)
ZSET 빌드 주체	streamer 스케줄러 (5분 주기)	commerce-batch (1일 1회)

4. 핵심 설계 결정

4-1. App streaming 집계 - DB GROUP BY 회피

결정: DB는 ORDER BY product_id, bucket_time range scan만 수행하고, 집계는 App의 StreamingMetricAggregator가 product 경계를 감지하며 O(1) 메모리로 처리한다.

대안: SELECT product_id, SUM(view_count) GROUP BY product_id - 코드 한 줄로 끝나고 직관적이다.

왜 App 집계를 택했는가:

DB GROUP BY는 평균 성능은 좋지만 꼬리가 위험하다. 입력이 스파이크를 치면(Hot product가 bucket 수천 개를 소유) tmp table → filesort로 떨어지며 실행 시간이 비선형으로 폭발한다.
App streaming은 입력 N배 → 시간 N배라는 예측 가능성을 보장한다. 배치에서는 평균 성능보다 worst-case 예측 가능성이 더 중요하다 - SLA 시간 안에 끝난다는 확신이 운영 안정성이다.
소규모 실측(100만 row까지)에서 비선형 폭발이 관측되지 않음을 확인. 대규모(3,000만 row) 검증은 후속 과제.

StreamingMetricAggregator — 상품 경계 감지 + O(1) 메모리 집계

public AggregatedMetric next() throws Exception {
    if (exhausted) return null;

    RawMetricRow current = (lookahead != null) ? lookahead : source.readOne();
    lookahead = null;
    if (current == null) { exhausted = true; return null; }

    Long productId = current.productId();
    long sum7d = 0L, sum30d = 0L;

    while (current != null && productId.equals(current.productId())) {
        sum30d += current.count();
        if (!current.bucketTime().isBefore(last7dStart)) {
            sum7d += current.count();
        }
        current = source.readOne();
    }

    // 다음 product 의 첫 row 를 lookahead 에 보관
    lookahead = current;
    if (current == null) exhausted = true;
    return new AggregatedMetric(productId, sum7d, sum30d);
}

### 4-2. Cursor + lookahead ItemStream 직렬화

결정: streaming aggregator의 lookahead(다음 product의 첫 row)를 ExecutionContext에 primitive 3개(productId, bucketTime, count)로 직렬화한다. chunk commit마다 update()에서 저장하고, restart 시 open()에서 복원한다.

대안 1: saveState(false) - restart 시 처음부터 재시작. 코드가 단순하지만 restart 기능을 삭제한 것이지 문제를 해결한 것이 아니다.

대안 2: OFFSET 기반 paging - restart 친화적이지만 O(N²) 붕괴 위험.

왜 이쪽을 택했는가:

streaming aggregator는 cursor에서 다음 product의 첫 row를 미리 읽어야(lookahead) product 경계를 판단한다. 이 lookahead가 chunk commit 시점에 직렬화되지 않으면, restart 후 해당 row가 증발한다.
ItemStream의 open()/update() 계약이 정확히 이 문제를 위해 존재한다. lookahead를 직렬화 가능한 primitive로 분해하는 것이 Spring Batch의 정석 패턴이다.
결과: chunk-mid restart + streaming 집계 정확성 모두 확보.

ViewMetricStreamingReader — lookahead 직렬화 (open/update)

@Override
public void open(ExecutionContext executionContext) throws ItemStreamException {
    delegate.open(executionContext);
    this.aggregator = new StreamingMetricAggregator(delegate::read, last7dStart);

    // restart 시 이전 chunk 종료 시점의 lookahead 복원
    if (executionContext.containsKey(CTX_LOOKAHEAD_PRODUCT_ID)) {
        RawMetricRow restored = new RawMetricRow(
                executionContext.getLong(CTX_LOOKAHEAD_PRODUCT_ID),
                LocalDateTime.parse(executionContext.getString(CTX_LOOKAHEAD_BUCKET_TIME)),
                executionContext.getLong(CTX_LOOKAHEAD_COUNT)
        );
        aggregator.setLookahead(restored);
    }
}

@Override
public void update(ExecutionContext executionContext) throws ItemStreamException {
    delegate.update(executionContext);

    // chunk commit 시점에 lookahead 를 primitive 3개로 직렬화
    RawMetricRow lookahead = aggregator.getLookahead();
    if (lookahead != null) {
        executionContext.putLong(CTX_LOOKAHEAD_PRODUCT_ID, lookahead.productId());
        executionContext.putString(CTX_LOOKAHEAD_BUCKET_TIME, lookahead.bucketTime().toString());
        executionContext.putLong(CTX_LOOKAHEAD_COUNT, lookahead.count());
    } else {
        executionContext.remove(CTX_LOOKAHEAD_PRODUCT_ID);
        executionContext.remove(CTX_LOOKAHEAD_BUCKET_TIME);
        executionContext.remove(CTX_LOOKAHEAD_COUNT);
    }
}

4-3. MV DELETE+INSERT 단일 TX 원자 교체

결정: Step 5(PromoteTopToMvTasklet)에서 @Transactional 단일 TX 안에 DELETE FROM mv WHERE anchor_date = ? → INSERT INTO mv ... SELECT TOP 100을 실행한다.

대안: 별도 Step으로 purge(DELETE) → promote(INSERT) 분리. 각 Step의 책임이 명확하다.

왜 단일 TX를 택했는가:

별도 Step이면 purge 직후 ~ promote 완료 직전까지 MV가 비어있는 순간이 API에 노출된다. 이를 막으려면 API에 "MV가 비어있으면 캐시에서 서빙" 같은 보완 로직이 필요하고, 이는 보완에 보완을 쌓는 구조다.
단일 TX라면 MVCC에 의해 다른 세션은 커밋 전까지 이전 MV를, 커밋 후에는 새 MV만 봄. MV가 통과하는 상태가 "이전 데이터" 또는 "새 TOP 100" 두 가지로만 제한된다 - 중간 상태 불가시성.

PromoteTopToMvTasklet — DELETE+INSERT 원자 교체 + 실행 이력

@Override
@Transactional
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
    // 1. DELETE — 이전 MV 제거 (같은 TX 안이라 외부에 아직 안 보임)
    int deleted7d  = jdbcTemplate.update(
            "DELETE FROM mv_product_rank_last_7d WHERE anchor_date = ?", sqlDate);
    int deleted30d = jdbcTemplate.update(
            "DELETE FROM mv_product_rank_last_30d WHERE anchor_date = ?", sqlDate);

    // 2. INSERT — TOP 100 적재
    for (WeightConfig config : configs) {
        int inserted7d = promote(INSERT_SQL_LAST_7D, PERIOD_LAST_7D, anchorDate, config.getGroupName(), createdAt);
        int inserted30d = promote(INSERT_SQL_LAST_30D, PERIOD_LAST_30D, anchorDate, config.getGroupName(), createdAt);

        // 3. 실행 이력 기록 — MV 적재와 같은 TX 에서 커밋
        auditLogRepository.save(BatchAuditLog.ok(jobExecutionId, anchorDate, PERIOD_LAST_7D, config.getGroupName(), inserted7d));
        auditLogRepository.save(BatchAuditLog.ok(jobExecutionId, anchorDate, PERIOD_LAST_30D, config.getGroupName(), inserted30d));
    }
    return RepeatStatus.FINISHED;
}

// TOP 100 INSERT SQL (CTE + ROW_NUMBER)
private static String insertSqlFor(String mvTable) {
    return """
        INSERT INTO %s (anchor_date, weight_group, product_id,
                        view_count, like_count, sales_amount, score, rank_position, created_at)
        WITH ranked AS (
            SELECT weight_group, product_id, view_count, like_count, sales_amount, score,
                   ROW_NUMBER() OVER (ORDER BY score DESC, product_id ASC) AS rn
              FROM staging_ranking_scored
             WHERE period_type = ? AND period_key = ? AND weight_group = ?
        )
        SELECT ?, weight_group, product_id, view_count, like_count, sales_amount, score, rn, ?
          FROM ranked WHERE rn <= ?
        """.formatted(mvTable);
}

4-4. 외부 입력 전량 동결 - anchorDate + weightConfig snapshot

결정: RankingJobParametersListener.beforeJob()에서 anchorDate 파싱 + 활성 weight_config 전량을 ExecutionContext에 JSON 직렬화한다. 이후 모든 Step은 DB를 재조회하지 않고 스냅샷만 사용한다.

대안: anchorDate만 동결하고, weight_config는 각 Step이 findAllByActiveTrue()로 매번 조회.

왜 전량 동결을 택했는가:

anchorDate만 bounded하고 weight_config는 외부 상태로 놔두면 원칙 일관성 훼손이다. Job 실행 중 DBA가 weight 값을 바꾸면 Step 1 ~ 3은 옛 가중치, Step 4는 새 가중치로 계산하는 데이터 불일치가 생긴다.
"모든 외부 입력을 Job 시작 시점에 동결"은 배치 멱등성의 전제조건이다. 시간 경계(anchorDate)와 설정 경계(weightConfig)가 모두 frozen되어야 같은 Job 재실행 = 같은 결과.

RankingJobParametersListener.beforeJob()

4-5. JdbcTemplate - JPA Writer를 쓰지 않는 이유

배치 Writer를 JPA로 쓰면 배치의 이점이 대부분 사라진다.

방식	문제
`JpaItemWriter` + `saveAll()`	`merge()` 내부에서 건별 SELECT가 먼저 발생 → JDBC batch 묶음 효과 0
1차 캐시 (EntityManager)	대량 처리 시 메모리 누적 → OOM 위험. `flush()`/`clear()` 수동 관리 필요
Dirty checking	chunk 전체를 메모리에 유지해야 작동 → 스트리밍 철학 훼손

Persistable 구현, @SQLInsert, StatelessSession 등 JPA 우회 방법도 검토했으나, 모두 트릭키하거나 UPSERT 불가. JdbcTemplate + batchUpdate가 단순·명시적·빠름. 도메인 모델은 domain/ 패키지에 그대로 유지하되, Batch Writer는 Infrastructure 관점의 JDBC로 분리.

5. 설계 진화 - 무엇을 고치며 배웠는가

처음부터 완벽하지 않았다. 세 번의 설계 개선이 있었고, 각각이 문제를 발견하고 근본 원인을 파고든 과정이다.

진화 1: saveState(false) → lookahead ItemStream 직렬화


Before	streaming aggregator의 lookahead 처리가 복잡해서 `saveState(false)`로 설정. restart 시 처음부터 재시작.
문제 발견	이건 문제를 우회한 것이지 해결한 것이 아니다. restart 기능을 삭제하면 chunk-mid 실패 시 전체 재시작 비용을 감수해야 하고, Spring Batch를 쓰는 이유(완료 경계 추적)를 스스로 포기하는 것.
After	ItemStream의 `open()/update()` 계약을 이용해 lookahead를 primitive 3개로 직렬화. chunk-mid restart가 정확히 작동하면서도 streaming 집계의 정확성을 유지.
교훈	프레임워크 기능을 끄는 것은 최후의 수단이어야 한다. 어려운 부분을 우회하면 그 어려움이 다른 곳에서 더 큰 비용으로 돌아온다.

진화 2: anchorDate만 동결 → 외부 입력 전량 스냅샷


Before	`anchorDate`는 JobParameter로 고정했지만, `weight_config`는 4개 Step이 각각 `findAllByActiveTrue()`로 DB를 재조회.
문제 발견	anchorDate만 bounded하고 weight_group은 외부 상태로 놔둔 건 원칙 일관성 훼손이다. "배치의 모든 외부 입력은 Job 시작 시점에 고정" 원칙을 선언해놓고 weight_config에는 적용하지 않은 모순.
After	`beforeJob()`에서 활성 weight_config 전량을 ExecutionContext에 JSON 직렬화. 모든 Step이 스냅샷만 참조.
교훈	원칙을 세웠으면 예외 없이 적용해야 한다. "이건 안 바뀔 거야"라는 가정은 운영에서 깨진다.

진화 3: purge/promote 분리 → 단일 TX 원자 교체


Before	Step 4a(purge 7d) → Step 4b(purge 30d) → Step 5(score) → Step 5b(promote). purge와 promote가 별도 Step·별도 TX.
문제 발견	purge 후 promote 전까지 MV가 비어있는 구간이 존재. 이를 막기 위해 API에 캐시 fallback을 추가했지만, 이건 보완에 보완을 쌓는 구조다. 사용자 입장에선 아무 차이 없는데 코드 복잡도만 올라갔다.
After	`PromoteTopToMvTasklet` 하나에서 `@Transactional` 단일 TX로 DELETE → INSERT. MV가 비어있는 물리적 순간이 사라졌다.
교훈	문제의 본질이 "두 연산 사이의 간극"이면 해결은 "간극을 없애는 것"이지 "간극에서 보이는 걸 가리는 것"이 아니다.

6. 가드레일 - MV 원자 교체 + 전일 Fallback

MV가 API에 서빙되기 전에 두 겹의 안전장치가 작동한다.

API 전일 fallback

MV가 비어있으면(배치 미실행) API는 전일 anchor로 자동 retry (최대 3일, A/B 그룹 유지). "잘못된 랭킹을 보여주는 것"보다 "어제 랭킹이라도 보여주는 것"이 사용자 경험상 낫다는 운영 판단.

RankingService:94-119

Step별 실패 복구 비용

어디서 터져도 MV가 오염되지 않고, 복구 비용이 낮다.

실패 지점	재실행 범위	복구 비용	MV 영향
Step 0 (초기화)	Step 0부터	매우 낮음	아직 안 건드림
Step 1 ~ 3 (집계)	실패한 chunk부터 이어서	낮음	아직 안 건드림
Step 4 (score)	Step 4부터	중간	아직 안 건드림 - 스테이징만 작업 중
Step 5 (promote)	Step 5만	매우 낮음	단일 TX 롤백 → 이전 MV 유지
Step 6 (Redis)	Step 6만	매우 낮음	MV 이미 영속 - 캐시만 다시 갱신

Step 0 ~ 4까지는 MV를 건드리지 않으므로 실패해도 API에 영향이 없다. Step 5는 단일 TX라 실패하면 롤백되어 이전 MV가 유지된다. Step 6은 MV가 이미 확정된 상태에서 캐시만 갱신하므로 실패해도 MV fallback 경로로 서빙된다.

7. 측정 결과

환경: 로컬 개발 머신 (TestContainers MySQL 8.0 / Redis 7), JVM 첫 실행 (warm-up 없음)
재현: ./scripts/measure-ranking-batch.sh

원시 데이터

label	products	seedRows	jobMs	tps
S	1,000	156,758	1,727	90,769/s
M	5,000	193,236	3,329	58,046/s
L	20,000	328,236	6,517	50,366/s
XL_SPIKE	100,000	1,048,236	9,754	107,467/s

시드 분포: Zipf α=1.2 (Hot/Warm/Normal/Cold/Sleeping 5-tier). 모든 단계에서 mv7dCount = 100, 정상 완주.

구간별 분석

구간	seedRows 배율	jobMs 배율	시간/입력 비율
S → M	1.23×	1.93×	1.57 (입력보다 시간이 더 증가)
M → L	1.70×	1.96×	1.15
L → XL	3.19×	1.50×	0.47 (입력보다 시간이 덜 증가)

S→M에서 시간이 입력보다 빠르게 증가하고, L→XL에서는 반대로 느리게 증가한다. 이는 "선형"이 아니라 Job 시작 고정 비용(Step 메타 기록, bean 초기화 등)이 소규모에서 비중이 크고 대규모에서 희석되는 패턴으로 보인다. 이 데이터로 "선형성"을 주장하기에는 구간별 편차가 크다.

확인한 것과 미확인된 것

확인됨:

Job이 S/M/L/XL 네 단계(15만 ~ 100만 row)에서 정상 완주
100만 row까지 비선형 폭발(임시 테이블 전환, 디스크 정렬 등 급격한 성능 저하)은 관측되지 않음
MV 정의된 상태만 노출 (count=100, rank 1 ~ 100 연속, product_id 중복 없음)

미확인 - 후속 과제:

입력 규모 부족이 가장 큰 한계. 설계에서 가정한 L 규모는 약 3,000만 row로, 본 측정(100만 row)의 약 30배. 100만 row에서 폭발이 없었다고 3,000만 row에서도 없다는 보장은 없다. buffer pool 초과, 디스크 정렬 전환, GC pressure 같은 임계점을 이 측정에서는 건드리지 않았을 가능성이 높다.
단일 머신 / TestContainers - 운영 환경의 디스크 IO, 네트워크 latency 미반영
단일 weight_group - 그룹 fan-out 영향 미측정
단일 anchor - 백필 시나리오(다중 anchor 순차) 미측정

후속 측정 제안

설계 L 규모 재현 → 3,000만 row에서 비선형 폭발 발생 여부 확인 (가장 중요, 운영 환경급 인프라 필요)
L 단계 5회 반복 → jobMs 표준편차로 안정성 확인
2 weight_group 활성화 → fan-out 비용 측정

8. 테스트 전략

계층별 분류

계층	테스트	파일 수	검증 대상
단위	`RollingWindowResolverTest`, `ScoreFormulaTest`, `StreamingMetricAggregatorTest` 등	6	순수 로직 (윈도우 계산, score 공식, 집계 경계)
Step 통합	`StageViewMetricsStepIntegrationTest`, `ScoreAggregationStepIntegrationTest` 등	6	개별 Step의 DB 입출력 정합
E2E	`RollingRankingJobE2ETest`, `RankingApiE2ETest`	2	Job 전체 파이프라인 + API MV fallback
재시작 검증	`RollingRankingJobRestartTest`	1 (5 시나리오)	chunk-mid restart, MV 보호, anchor 격리, hot product 무절단
벤치마크	`RollingRankingJobBenchmark`	1	S/M/L/XL 정상 완주 확인

재시작 시나리오가 검증하는 것

Scenario	검증
1: chunk-mid 실패 → restart	1,200 product 중간에 Writer 실패 → restart 후 동일 결과 (Immutable Input 전제 검증)
2a: Step 4 score 실패	MV에 불완전 데이터가 들어가지 않음 (MV 보호)
2b: Step 5 promote 실패	MV 원자 교체 실패 시 이전 MV 유지
3: 다른 anchor 격리	백필(다른 날짜 재실행)이 기존 anchor 결과를 침범하지 않음
4: Hot product 긴 체인	bucket 1,000개를 가진 상품이 chunk 경계에서 중간 절단되지 않음

시드 분포

BaselineSeeder가 Zipf α=1.2 분포로 Hot/Warm/Normal/Cold/Sleeping 5-tier 시드를 생성한다. 현실 이커머스의 인기 편중을 재현하여 Hot product가 bucket을 과점하는 worst-case를 벤치마크에 반영.

Review Point

RP01. 소규모 팀에서 배치 인프라 도입의 현실적 경계

이번 설계에서 @Scheduled → Spring Batch 전환을 결정한 근거는 경계 세 층위(선택/처리/완료)가 명시적이지 않았기 때문입니다. 실패 복구, 멱등성, 진행 가시성이 스케줄러에는 없으니까요.

그런데 현실적으로, 소규모 스타트업에서 주간/월간 랭킹 하나 만드는 데 JobRepository 메타 테이블 10개, 별도 모듈, Cron 트리거 분리까지 갖추는 게 비용 대비 이득이 있는 시점은 언제일까요?

반대로, "스케줄러 + 멱등한 SQL 한 방"으로 충분한 규모의 상한이 있다면 어디쯤일까요? 그 상한을 넘어서 배치 인프라가 "오버엔지니어링이 아니라 필수"가 되는 시그널은 무엇인지 궁금합니다.

그리고 이력서에 "Spring Batch 기반 배치 시스템 구축" 같은 키워드가 있을 때, 리뷰어 입장에서는 어떤 규모·맥락을 기대하시는지, 혹은 소규모 서비스에서 이 키워드를 쓰면 오히려 "왜 배치까지?" 라는 의문이 드시는지도 궁금합니다.

RP02. 스케줄러의 한계를 배치가 아닌 다른 방식으로 풀 수 있었는가

스케줄러의 핵심 한계를 세 가지로 봤습니다: (1) 실패 시 "어디까지 했는지" 모름 (2) 같은 입력에 같은 결과 보장 없음 (3) 백필 불가능.

이 세 가지를 Spring Batch 없이 해결하는 방법도 있을 것 같습니다. 예를 들어 스케줄러 + "processed 플래그 컬럼" + "멱등한 UPSERT" 조합으로도 (1), (2)는 풀리고, (3)은 수동 파라미터로 해결할 수 있습니다.

이런 "프레임워크 없이 직접 만든 배치"와 "Spring Batch 도입"의 경계를 어디에 두시는지, 혹은 직접 만든 배치로 충분히 운영해본 경험이 있으신지 궁금합니다.

RP03. 이 배치의 실패 시 사용자 영향이 실제로 어느 정도인가

설계에서 가장 많은 비용을 들인 부분이 "실패해도 안전하다" (이중 안전장치, MV 원자 교체, 전일 fallback)입니다. 그런데 솔직히, 이 배치가 하루 실패해서 어제 랭킹이 오늘도 보이는 게 사용자에게 얼마나 아픈 일인가? 라는 의문이 있습니다.

정산 배치가 실패하면 돈이 안 들어오니까 즉시 대응이지만, 랭킹 배치가 하루 밀리면 사용자는 모를 수도 있습니다. 이런 "비즈니스 임팩트가 낮은 배치"에서도 이 수준의 안전장치가 정당화되는지, 아니면 과도한 방어인지 의견이 궁금합니다.

- RollingWindow(VO): anchor_date 기준 LAST_7D/LAST_30D 경계 보관 - RollingWindowResolver: yyyyMMdd 파싱 + STRICT 검증 - RankingJobParametersListener: beforeJob 에서 JobParameter.anchorDate → ExecutionContext 로 경계 5개 주입 (재시작 시 덮어쓰기 금지) - 단위 테스트 9종: 경계 계산, 오늘 제외, 월 경계 crossing, 무효 입력 거부 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- StagingRankingAggregation (1차): period_type + period_key + product_id PK, Step 1~3 이 metric 별 raw sum 을 UPSERT 할 공간 - StagingRankingScored (2차): weight_group 을 PK 에 포함, score DESC 정렬 인덱스. Step 5 가 전체 상품 score 를 저장하는 격리 공간 (MV 는 TOP 100 만 진입시키기 위해 중간 상태를 여기에 둠) - TruncateStagingTasklet: ExecutionContext.anchorDateKey 로 현재 anchor 의 row 만 DELETE (전체 TRUNCATE 아님, 다른 anchor 실행과 격리) - RollingRankingJobConfig: Step 0 만 연결된 Job 스캐폴드 구성 (이후 커밋에서 Step 1~7 순차 추가) - 통합 테스트 4종: 타겟 anchor 만 삭제/빈 상태 멱등/같은 anchor 반복 실행/ anchorDate 누락 시 Job FAIL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- ProductViewMetric 원천 읽기 모델 (commerce-streamer 의 스키마 미러) - AggregatedMetric / RawMetricRow DTO - StreamingMetricAggregator: product_id 경계 감지 + 7d/30d 조건부 누적 O(1) 메모리 - ViewMetricStreamingReader: JdbcCursorItemReader (fetchSize 2000) + aggregator wrapper. DB 는 bucket_time 범위 scan 만, GROUP BY 없음. 집계는 App 책임 - StagingAggregationProcessor: 1 Aggregated → 2 StagingRow (LAST_7D + LAST_30D) fan-out - StagingViewMetricsWriter: JdbcTemplate batch UPSERT (ON DUPLICATE KEY UPDATE view_count) - JPA saveAll() merge 함정 회피, 1차 캐시 OOM 방지 - StageViewMetricsStepConfig: chunkSize 500, 트랜잭션 경계 분리 - RollingRankingJobConfig: Step 0 → Step 1 체인 테스트: - StreamingMetricAggregatorTest (5 단위): 경계 감지, 7d 경계 exclusive, 빈 소스, 재호출 안정성, 긴 체인 8640 row O(1) 검증 - StageViewMetricsStepIntegrationTest (3 통합): 범위 내외 필터, 멱등성, 빈 원천 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

View 스테이징 패턴을 Like, Order 에도 동일하게 적용. DB 는 테이블별 범위 scan 만, App streaming aggregator 로 product 경계 집계. - ProductLikeMetric / ProductOrderMetric 원천 읽기 모델 - LikeMetricStreamingReader / OrderMetricStreamingReader: cursor + aggregator - StagingLikeAggregationProcessor: AggregatedMetric → like_count 자리 fan-out - StagingOrderAggregationProcessor: AggregatedMetric → sales_amount 자리 fan-out - StagingLikeMetricsWriter / StagingOrderMetricsWriter: ON DUPLICATE KEY UPDATE like_count / sales_amount 만 갱신 - Step 1 이 먼저 만든 row 는 UPSERT 의 UPDATE 가지, Like/Order 만 있는 상품은 같은 쿼리의 INSERT 로 신규 row 생성 - StageLikeMetricsStepConfig / StageOrderMetricsStepConfig (chunk=500) - RollingRankingJobConfig: Step 0 → 1 → 2 → 3 체인 확정 테스트 (StageMetricsPipelineIntegrationTest, 4 통합): - 3 메트릭 모두 있는 상품의 단일 row 병합 - Like only / Order only 상품의 INSERT-then-UPDATE 경로 - 메트릭별 독립 상품 집합 동시 적재 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Step 5b 가 INSERT 하기 전에 "MV 는 비어있음" 상태를 보장하여, MV 가 거쳐가는 상태를 "비어있음 → 확정된 TOP 100" 두 가지로만 제한한다. 조회 API 가 배치 도중 MV 를 읽더라도 정의된 상태만 보게 됨 (중간 상태 불가시성). - MvProductRankId 공통 PK (anchor_date, weight_group, product_id) - MvProductRankLast7d / MvProductRankLast30d 엔티티 + rank_position 인덱스 - 도메인 Repository 인터페이스 + RepositoryImpl + package-private JpaRepository - PurgeLast7dMvTasklet / PurgeLast30dMvTasklet (@StepScope, anchorDateKey → LocalDate) - PurgeMvStepConfig: Step 4a/4b bean 등록 - RollingRankingJobConfig: Step 0 → 1 → 2 → 3 → 4a → 4b 체인 통합 테스트 (PurgeMvStepIntegrationTest, 3): - 타겟 anchor 만 양쪽 MV 에서 삭제, 다른 anchor 유지 - 동일 anchor 의 weight_group 여러 개를 전부 삭제 - 빈 MV 상태에서도 성공 (첫 실행 시나리오) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Step 5 는 1차 스테이징의 raw sum 에 weight_group 별 score 를 계산해 2차 스테이징(staging_ranking_scored) 에 전체 상품을 적재한다. MV 는 건드리지 않으며, Step 5b 가 이 2차 스테이징에서 TOP 100 만 MV 로 promote 한다. - WeightConfig 엔티티 미러 + Repository (ranking_weight_config) 활성 그룹이 없으면 "control" 기본값으로 fallback (streamer RankingAggregator 일관) - ScoreFormula: score = w_v×v + w_l×l + w_o×log10(sales+1) 순수 함수. sales 는 금액 단위라 log10 정규화로 다른 지표를 압도하지 않게 함 - ScoreProcessor (@StepScope): @BeforeStep 에서 활성 WeightConfig 로드 후 row 당 activeConfigs.size() 만큼 fan-out, Step 내 DB 조회 없음 (Bulk) - StagingScoredWriter: JdbcTemplate batch UPSERT — 재시작 시 같은 PK 덮어쓰기 - ScoreAggregationStepConfig: 1차 staging cursor (fetchSize 2000) + chunk 500 - RollingRankingJobConfig: Step 0 → 1 → 2 → 3 → 4a → 4b → 5 체인 테스트: - ScoreFormulaTest (4 단위): 가중 합 공식, log10(0+1)=0 안전성, sales 의 log 정규화, 가중치별 score 발산 - ScoreAggregationStepIntegrationTest (3 통합): 단일 그룹 fan-out + 공식 검증, 여러 그룹 독립 score + inactive 제외, 빈 원천 성공 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

롤링 7일/30일 랭킹 배치의 마지막 세 Step 을 추가해 Job 을 완성한다. Step 5b (PromoteTopToMvTasklet): - 2차 스테이징 → MV 로 TOP 100 을 단일 SQL INSERT (CTE + ROW_NUMBER) - (period_type × weight_group) 조합 당 한 번의 쿼리 - Step 4a/4b 의 사전 DELETE 와 맞물려 MV 는 "비어있음 → 확정된 TOP 100" 두 상태만 통과 (중간 상태 불가시성) Step 7 (AuditTasklet, batch_audit_log): - 불변조건 검증: rank 1..count 연속, product_id 중복 없음, count ≤ TOP_N - 실패 시 Job FAIL → Step 6 (Redis 전파) 차단 - BatchAuditLog 에 OK/FAILED 기록 — "Job COMPLETED 인데 데이터가 틀림" 감지 - CHECKSUM 은 의도적으로 미도입 (랭킹은 금융이 아니므로 과잉) Step 6 (RedisRefreshTasklet): - MV → Redis ZSET identity cache copy (score·순서 그대로) - shadow key + RENAME 으로 원자적 교체, TTL 3일 - ResourcelessTransactionManager (Redis 조작은 DB 트랜잭션 불필요) RollingRankingJobConfig: - 0 → 1 → 2 → 3 → 4a → 4b → 5 → 5b → 7 → 6 최종 체인 테스트: - RollingRankingJobE2ETest (4): 전체 파이프라인 + identity cache, 다중 weight_group 독립 Redis 키, 같은 anchor 재실행 멱등성, 빈 원천 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

commerce-batch 가 확정한 롤링 7일/30일 MV 를 commerce-api 가 읽기 위한 스키마 미러 엔티티와 Query Repository. - MvProductRankId (anchor_date, weight_group, product_id) 공통 PK - MvProductRankLast7d / MvProductRankLast30d 읽기용 엔티티 (쓰기 소유자는 commerce-batch, commerce-api 는 조회만) - MvRankEntry: (productId, score, rankPosition) 최소 투영 DTO - MvRankingQueryRepository + JdbcTemplate 기반 Impl JPA Entity 로 읽지 않는 이유: 영속성 컨텍스트 불필요 + 단순 투영이라 JDBC 가 명료함 (설계.md "Writer 쪽 JPA 함정" 과 같은 결) Redis identity cache miss 시 이 Repository 가 fallback 경로로 진입한다. 다음 커밋에서 RankingService/KeyResolver 를 LAST_7D/LAST_30D 로 교체하며 사용. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

배치가 생성한 롤링 7일/30일 MV 를 그대로 노출하도록 API 계약을 전환한다. 사용자 언어는 "주간/월간" 이지만 내부적 의미가 "오늘 제외 롤링 N일" 이므로 enum 이름을 LAST_7D / LAST_30D 로 명시화 (설계.md 프롤로그 + 멘토링 결론). - RankingPeriod: REALTIME, DAILY, LAST_7D, LAST_30D (WEEKLY/MONTHLY 제거) - RankingKeyResolver.resolve: 캘린더 키 삭제, anchor_date = date-1 로 ranking:last7d:{yyyyMMdd}:{group} / ranking:last30d:{yyyyMMdd}:{group} 배치가 쓰는 Redis 키와 동일 포맷 - RankingService.loadRankEntries: 이원화된 fallback · REALTIME/DAILY → 기존 bucket 집계 재계산 (RankingFallbackAggregator) · LAST_7D/LAST_30D → MvRankingQueryRepository 직접 조회 (확정된 TOP N 투영) - totalCount 도 Redis miss 시 MV count 로 fallback 테스트: - RankingKeyResolverTest: 롤링 키 생성, 월 경계 안전성, anchor 계산 - RankingServiceMvFallbackTest (5 Mockito): Redis miss → MV 경로, Redis hit → MV 미호출, 빈 MV 처리, totalCount fallback - RankingApiE2ETest: LAST_7D/LAST_30D 200 응답 + MV fallback 2 시나리오 (실제 MV 테이블에 시드 후 API 호출로 end-to-end 검증) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

설계.md 의 재시작 검증 시나리오를 구현. Streaming aggregator 의 lookahead 와 Spring Batch chunk-mid restart 가 본질적으로 충돌하므로, Step 1~3/5 의 chunk Reader 를 saveState=false 로 두고 처음부터 재시작하게 한다. 대신 UPSERT 의 멱등성으로 "재실행해도 같은 결과" 를 보장 (이중 안전장치 중 2차). - ViewMetricStreamingReader / LikeMetricStreamingReader / OrderMetricStreamingReader / ScoreAggregationStep 의 cursor reader 에 .saveState(false) 적용 테스트 (RollingRankingJobRestartTest, 4 시나리오): - Scenario 1: Step 1 chunk-mid 실패 → restart → 1200 product staging 결과 한 번에 돌렸을 때와 동일 (UPSERT 멱등성 검증) - Scenario 2a: Step 5 (Score) 실패 → MV 비어있음 보장 (WeightConfigRepository SpyBean 으로 첫 호출 throw) - Scenario 2b: Step 5b (Promote) 실패 → 2차 staging 적재된 채 MV 비어있음 (호출 카운트로 두 번째 호출만 throw → ScoreProcessor 통과, Promote 실패) - Scenario 3: 다른 anchorDate (20260413, 20260414) 가 서로 격리되어 덮어쓰지 않음 (백필 안전성) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

설계.md "트래픽 전제" 의 5-tier (Hot/Warm/Normal/Cold/Sleeping) 분포를 Zipf α=1.2 로 자동 시드하고 분포 invariant 를 검증한다. - Tier enum, SeedSpec(record): S/M/L 단계별 totalProducts 프리셋 - BaselineSeeder: Zipf 분포로 product 별 일일 이벤트 생성 → view/like/order 메트릭 테이블에 JDBC batch INSERT · view : like : order = 10 : 1 : 0.1 · 이벤트 양 = floor(C / (rank+1)^1.2), C=2000 · 활동 30% / Sleeping 70% (이벤트 0) · seed 고정으로 결정적 재현 · rewriteBatchedStatements 환경에서 batchUpdate 가 SUCCESS_NO_INFO 를 반환하므로 row 수는 chunk.size() 로 누적 - BaselineSeederIntegrationTest (4): · Sleeping 70% 이벤트 0 (활동 product ≤ totalProducts × 30%) · Hot tier (상위 0.1%) 가 전체 view 의 30% 이상 점유 (Zipf head 검증) · 같은 seed 로 두 번 돌리면 row 수 동일 (결정성) · view : like : order ≈ 10 : 1 : 0.1 비율 검증 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@tag

설계.md Phase 5 (Loopers-dev-lab#26~29) 의 측정을 자동화한다. - RollingRankingJobBenchmark (@tag("benchmark")): S(1k) / M(5k) / L(20k) / XL_SPIKE(100k) 4단계로 시드 → Job 실행 → build/benchmark-results.txt 에 KV 라인 (label, products, seedRows, jobMs, tps, mv7dCount) append. gradle test 의 stdout 캡처 우회. - build.gradle.kts: · 평소 ./gradlew test 는 excludeTags("benchmark") 로 측정 SKIP · ./gradlew benchmarkTest 가 includeTags 로 측정 전용 실행 - scripts/measure-ranking-batch.sh: benchmarkTest 호출 + 결과 파일 파싱 → 표 형식 stdout 결과 정리는 사람이 week10/측정결과.md 에 직접 기록 (자동 생성 아님) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…mid restart 복원 saveState(false) + 전체 재시작 우회를 걷어내고, Spring Batch ItemStream 정석 패턴으로 교체한다. chunk commit 시 aggregator 의 lookahead (다음 product 의 첫 row) 를 primitive 3개 (productId, bucketTime, count) 로 ExecutionContext 에 저장하고, restart 시 open() 에서 복원한다. 이로써: - 실패한 chunk 다음부터 정확하게 이어가기 가능 (누락 없음) - saveState=true (기본값) 복원 → cursor 위치도 정상 저장 - UPSERT 멱등성은 여전히 이중 안전장치로 유지 변경 파일: - StreamingMetricAggregator: getLookahead() / setLookahead() 추가 - ViewMetricStreamingReader: open/update 에 lookahead 직렬화/복원 + saveState(false) 제거 - LikeMetricStreamingReader: 동일 패턴 - OrderMetricStreamingReader: 동일 패턴 - ScoreAggregationStepConfig: saveState(false) 제거 (Step 5 는 streaming aggregation 없이 1:1 변환이므로 기본 saveState=true 가 정상 작동) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chunk size(500) 가 raw row 수가 아니라 product 수 기준이므로, aggregator.next() 가 한 product 의 모든 bucket 을 원자적으로 소비한 후 emit 한다. 따라서 상품 중간 절단은 구조적으로 불가능하다. 이 올바름을 실증하는 Scenario 4 추가: - product 1: bucket 1,000개 (count=1,2,...,1000) - product 2, 3: bucket 5개씩 (count=10) - chunk size=500 이지만 3 product 라 chunk 1 에서 전부 처리 - 검증: product 1 의 staging view_count = 500,500 (1+2+...+1000) 중간 절단 시 이 합계가 달라지므로 정확성 완전 입증 - 검증: 총 staging = 3 product × 2 period = 6 row 이로써 CompletionPolicy 불필요 확인 — emit 단위가 이미 product 경계. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

anchorDate 만 Bounded 하고 weight_group 은 매 Step 마다 DB 실시간 조회하던 반쪽짜리 Bounded 를 수정한다. Job 이 의존하는 모든 외부 입력 (시간 윈도우 + 가중치 설정) 을 beforeJob() 에서 한 번에 동결. 문제: - restart 사이에 운영자가 experiment_a 를 활성화하면 Step 5 의 앞 chunk 는 [control], 뒤 chunk 는 [control, experiment_a] 로 fan-out → 2차 staging PK 불일치 → MV 불완전 수정: - RankingJobParametersListener.beforeJob(): active weight_group 이름 + 가중치 (wView/wLike/wOrder) 를 ExecutionContext 에 primitive 로 직렬화. 재시작 시 BATCH_JOB_EXECUTION_CONTEXT 에서 복원 → 최초 시작 시점 고정 - restoreWeightConfigs() static 헬퍼로 4개 Step 이 공통 사용 - ScoreProcessor / PromoteTopToMvTasklet / AuditTasklet / RedisRefreshTasklet: weightConfigRepository.findAllByActiveTrue() 직접 호출 제거 → ExecutionContext 스냅샷에서 복원으로 교체 - Restart 테스트 (2a/2b): 실패 주입 대상을 WeightConfigRepository → StagingScoredWriter 로 변경 (weight_group 이 더 이상 Step 에서 DB 조회 안 함) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Step 7 (audit) 이 불변조건 위반을 감지하면: 1. 해당 anchor_date 의 MV 를 즉시 DELETE (오염 격리) 2. batch_audit_log 에 FAILED 기록 3. Job FAILED 로 종료 (Step 6 Redis 전파 차단) 이로써 "잘못된 랭킹 서빙" 위험을 제거. API 측 (RankingService.fallbackFromMv): - 현재 anchor 의 MV 가 비어있으면 전일 anchor 로 자동 retry (최대 3일) - "어제 랭킹이라도 보여주기" 가 빈 화면보다 사용자 경험상 나음 - 정상 배치 완료 시 자연스레 당일 anchor 로 복귀 테스트: - RankingServiceMvFallbackTest: 전일 anchor 자동 fallback 검증 + 3일간 비어있으면 빈 리스트 검증 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

롤링 방식 전환 후 두 MV 모두 anchor_date 기준이므로 7d/30d 를 별도 Step 으로 나눌 이유가 없어짐. - PurgeLast7dMvTasklet + PurgeLast30dMvTasklet → PurgeMvTasklet 통합 두 DELETE 를 한 Tasklet 에서 순차 실행 (각각 idempotent) - PurgeMvStepConfig: STEP_LAST_7D/STEP_LAST_30D → STEP_NAME 단일 - RollingRankingJobConfig: .next(purge7d).next(purge30d) → .next(purgeMvStep) Job 체인: 0 → 1 → 2 → 3 → 4 → 5 → 5b → 7 → 6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@DisplayName

모든 테스트 파일에 동일 규칙 적용: - @DisplayName 전부 제거 → @DisplayNameGeneration(ReplaceUnderscores) 으로 교체 - 영어 메서드명 → 한글 조건_결과 서술형 (예: idempotentOnRerun → 같은_anchor_로_두번_돌려도_결과가_동일하다) - @nested class 로 관련 테스트 그룹화 (한글 클래스명) - 테스트 로직 (assert, given/when/then) 은 변경 없음 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@transactional

Step 4 (PurgeMvTasklet) 을 제거하고, Step 5b (PromoteTopToMvTasklet) 가 같은 @transactional 안에서 DELETE → INSERT 를 수행한다. READ COMMITTED 에서 커밋 전까지 외부 세션은 이전 MV 를 보고, 커밋 후에는 새 MV 만 보므로 "MV 가 비어있는 순간" 이 노출되지 않는다. 별도 Step 으로 분리했을 때 필요했던 API 전일 fallback 의존도가 줄어들고, Step 5 실패 시에도 이전 MV 가 그대로 유지되어 더 안전하다. - PurgeMvTasklet, PurgeMvStepConfig, PurgeMvStepIntegrationTest 삭제 - PromoteTopToMvTasklet: DELETE 7d/30d 를 INSERT 앞에서 같은 TX 로 실행 - RollingRankingJobConfig: purgeMvStep 제거 Job 체인: 0 → 1 → 2 → 3 → 5 → 5b → 7 → 6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

purge Step 삭제 후 빈 번호(4) + 뒤죽박죽 순서(7→6) 를 정리. 코드 로직 변경 없음, 주석/Javadoc 의 Step 번호만 교체. 이전: 0 → 1 → 2 → 3 → (4 삭제) → 5 → 5b → 7 → 6 이후: 0 → 1 → 2 → 3 → 4(score) → 5(promote) → 6(audit) → 7(redis) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

audit의 count/rank/distinct 검증은 SQL이 구조적으로 보장하는 불변조건이라 런타임 재검증이 불필요하다. 검증 + 오염 MV 삭제 로직을 제거하고, 실행 이력(audit_log) 기록을 Step 5(promote)의 동일 TX로 이동하여 MV 적재와 이력이 원자적으로 커밋되도록 변경한다. - AuditTasklet, AuditStepConfig 삭제 - PromoteTopToMvTasklet에 audit_log 기록 추가 (같은 TX) - BatchAuditLog.failed() 제거 (FAILED 경로 소멸) - Job Step 체인 8→7 (Step 0~6)

coderabbitai · 2026-04-17T08:08:09Z

📝 Walkthrough

Walkthrough

주간/월간 기반의 순위 기간을 7일/30일 롤링 윈도우로 변경하고, 배치 파이프라인을 통한 MV 테이블 기반의 순위 조회를 추가했다. Redis 미스 시 MV로 자동 폴백하며, 다중 가중치 그룹별 독립적인 점수 계산 및 스테이징 테이블 기반의 스트리밍 집계를 도입했다.

Changes

Cohort / File(s)	Summary
RankingPeriod 및 키 생성 로직 `apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingKeyResolver.java`, `apps/commerce-api/src/main/java/com/loopers/domain/ranking/RankingPeriod.java`	WEEKLY/MONTHLY를 LAST_7D/LAST_30D로 변경하고, 앵커 날짜 기반의 키 포맷 변경 (`rolling:last7d:YYYYMMDD` 형식). anchorDateOf() 메서드 추가로 어제 기준 앵커링 처리.
RankingService MV 폴백 `apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java`	MvRankingQueryRepository 주입 추가, LAST_7D/LAST_30D에 대해 Redis 미스 시 MV 직접 조회 폴백 구현, 앵커 날짜가 없으면 이전 날짜로 자동 재시도 (최대 3일).
MV 엔티티 및 조회 저장소 `apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/*`, `apps/commerce-api/src/main/java/com/loopers/infrastructure/ranking/MvRankingQueryRepositoryImpl.java`	MvProductRankLast7d/Last30d 엔티티, MvProductRankId 복합 키, MvRankEntry 레코드, MvRankingQueryRepository 인터페이스 및 JDBC 기반 구현 추가.
commerce-api 테스트 `apps/commerce-api/src/test/java/com/loopers/application/ranking/*`, `apps/commerce-api/src/test/java/com/loopers/interfaces/api/ranking/RankingApiE2ETest.java`	WEEKLY/MONTHLY 테스트를 LAST_7D/LAST_30D로 변경, MV 폴백 동작 및 E2E 검증 추가.
배치 테스트 구성 `apps/commerce-batch/build.gradle.kts`	벤치마크 테스트 태그 분리: 기본 test 태스크는 벤치마크 제외, benchmarkTest 태스크 신규 추가.
롤링 윈도우 파라미터 처리 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/*`	RollingRankingJobConfig, RankingJobParametersListener, RollingWindow, RollingWindowResolver 추가로 배치 파라미터 고정화 및 가중치 설정 스냅샷 관리.
배치 스테이징 테이블 및 엔티티 `apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/`, `apps/commerce-batch/src/main/java/com/loopers/domain/metrics/`	StagingRankingAggregation, StagingRankingScored, ProductViewMetric, ProductLikeMetric, ProductOrderMetric 엔티티 및 저장소 추가.
배치 Step 1-3: 메트릭 스테이징 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/*`	ViewMetricStreamingReader, LikeMetricStreamingReader, OrderMetricStreamingReader, StreamingMetricAggregator 통한 스트리밍 집계 및 StagingViewMetricsWriter, StagingLikeMetricsWriter, StagingOrderMetricsWriter로 병렬 적재.
배치 Step 0: 스테이징 초기화 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java`	anchorDateKey 기반 스테이징 테이블 전체 삭제.
배치 Step 4: 점수 계산 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/*`	ScoreFormula로 가중치 기반 점수 계산, ScoreProcessor로 다중 weight group 팬아웃, StagingScoredWriter로 UPSERT 적재.
배치 Step 5: MV 승격 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java`	anchorDate별로 MV 테이블 기존 행 삭제 후 상위 100개 제품 삽입, 배치 감사 로그 기록.
배치 Step 6: Redis 갱신 `apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java`	anchorDate 기반 MV 조회 후 Redis ZSET 섀도우 키로 재구축 및 원본 키로 원자적 교체, 3일 TTL 설정.
배치 감시 및 가중치 설정 `apps/commerce-batch/src/main/java/com/loopers/domain/ranking/audit/`, `apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/`	BatchAuditLog 엔티티 및 저장소, WeightConfig 엔티티 및 저장소 추가로 배치 실행 추적 및 동적 가중치 관리.
배치 통합 및 성능 테스트 `apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/*`	E2E 파이프라인 검증(RollingRankingJobE2ETest), 재시작 시나리오(RollingRankingJobRestartTest), 성능 벤치마크(RollingRankingJobBenchmark), 각 Step별 단위/통합 테스트.
벤치마크 지원 `apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/*`	BaselineSeeder로 Zipf 분포 기반 사전 로드 데이터 생성, SeedSpec 및 Tier로 다양한 규모 테스트 시나리오 지원.
벤치마크 실행 스크립트 `scripts/measure-ranking-batch.sh`	`benchmarkTest` 실행 및 결과 파싱 자동화 스크립트 추가.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as RankingService<br/>(commerce-api)
    participant Redis
    participant MV as MvRankingQuery<br/>Repository
    participant Batch as RollingRanking<br/>Job
    participant Staging as Staging<br/>Tables
    participant MVTable as MV<br/>Tables

    rect rgba(0, 100, 200, 0.5)
        Note over Client,API: 조회 요청 (LAST_7D)
        Client->>API: getRankEntries(date, LAST_7D)
        API->>Redis: 조회 (key: last7d:YYYYMMDD)
        alt Redis Hit
            Redis-->>API: 순위 데이터 반환
        else Redis Miss
            API->>MV: anchorDate 기반 직접 조회
            MV-->>API: MV 행 매핑
            API->>Redis: 캐시 저장 (3일 TTL)
        end
        API-->>Client: 순위 응답
    end

    rect rgba(100, 0, 200, 0.5)
        Note over Batch,MVTable: 배치 파이프라인 (매일 실행)
        Batch->>Staging: Step 0: anchorDateKey 스테이징 초기화
        Batch->>Staging: Step 1-3: 스트리밍 메트릭 수집<br/>(view/like/order)
        Staging-->>Batch: AggregatedMetric 병렬 처리
        Batch->>Staging: Step 4: 다중 weight group별<br/>점수 계산 및 저장
        Staging-->>Batch: 점수화 데이터
        Batch->>MVTable: Step 5: anchorDate별<br/>MV 테이블 갱신<br/>(상위 100개)
        Batch->>Redis: Step 6: ZSET 섀도우 재구축<br/>및 원자적 교체
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

[Volume 9] 상품 랭킹 기능 구현 #360: RankingService 및 RankingPeriod 기반 순위 시스템 기초 구현; 이번 PR은 LAST_7D/LAST_30D MV 폴백 통합을 추가로 구현한다.
feat: 10주차 - Spring Batch 기반 주간/월간 랭킹 시스템 구현 #397: MV 테이블 기반 순위 저장소 및 배치 인프라 도입; 이번 PR은 rolling-window 의미론으로 WEEKLY/MONTHLY를 완전히 대체하고 배치 파이프라인을 완성한다.
[9주차] 랭킹 시스템 - 최호석 #372: RankingKeyResolver 및 RankingPeriod 초기 설계; 이번 PR은 anchorDate 기반 키 생성 및 새로운 period enum으로 역호환성 깨는 변경을 적용한다.

운영 관점 주요 지적

1. 배치 파이프라인 복잡도 및 장애 격리

문제점: 6개 Step의 순차 파이프라인에서 중간 단계 실패 시 전체 재실행 필요. 특히 Step 5-6 (MV 승격/Redis 갱신)이 동시성 이슈 없이 순차 실행되어 Redis 불일치 가능성이 있다.
권장 사항:
- Step 별 독립 idempotency 보장 (현재 앵커 날짜 기반은 양호하지만, 트랜잭션 경계 명확화 필요)
- MV 삭제 후 INSERT 실패 시 부분 데이터 노출 방지: DELETE와 INSERT를 원자적 트랜잭션으로 감싸거나, INSERT 사전 검증 추가
- Redis 갱신 실패 시 MV 데이터만 유효하고 Redis 불일치 상태: fallback이 MV를 우선하므로 안전하나, 모니터링 필요

2. 성능 및 확장성

문제점: 스트리밍 집계 (Step 1-3)가 FETCH_SIZE=2000, CHUNK_SIZE=500으로 설정되어 있으나, 20,000개 상품 × 30일 히스토리 시 메모리 압박 및 배치 윈도우 증가 위험
권장 사항:
- 벤치마크 결과 (RollingRankingJobBenchmark) 기반 청크 크기 동적 조정
- StreamingMetricAggregator lookahead 재시작 로직이 복잡하므로, 통합 테스트에서 체크포인트 재시작 시나리오 강제 실행 (현재 RollingRankingJobRestartTest는 있으나, 모든 Step별 중간 실패 재시작 커버리지 확인 필요)

3. 데이터 일관성 및 감시

문제점: 배치 감사 로그 (BatchAuditLog)가 Step 5 (PromoteTopToMvTasklet)에서만 기록되므로, Step 1-4 실패 시 기록이 없어 근본 원인 추적 어려움
권장 사항:
- 각 Step 완료 후 감사 로그 (또는 메트릭) 기록: rowCount, duration, 실패 원인
- 정기적으로 SELECT anchorDate, COUNT(*) FROM batch_audit_log GROUP BY anchorDate 모니터링하여 누락된 날짜 감지

4. Redis TTL 및 캐시 일관성

문제점: MV → Redis 갱신 후 3일 TTL이지만, LAST_30D 데이터는 30일까지 조회 가능. 만약 배치가 실패하거나 지연되면 일관성 깨짐
권장 사항:
- Redis 키의 생성 시간 메타데이터 추가 검토
- fallback (MV 직접 조회)가 자동으로 triggered되므로 기능상 안전하나, 성능 저하 방지: Redis 갱신 실패 시 alert

5. weight group 동적 변경 관리

문제점: RankingJobParametersListener에서 배치 시점의 active WeightConfig 스냅샷을 잡지만, 배치 중 새 group이 추가되면 해당 group은 이번 배치에서 제외
권장 사항:
- weight group 변경 시 배치 이력을 명확히 하는 감시 로그 추가 (BatchAuditLog에 weightGroup 기록됨 - 양호)
- 정기적으로 최신 active group 수 추적하여 이상 감지

6. MV 테이블 인덱스 및 쿼리 성능

문제점: MvProductRankLast7d/Last30d 인덱스가 (anchor_date, weight_group, rank_position)인데, MvRankingQueryRepositoryImpl 조회는 anchor_date, weight_group로 필터 후 rank_position 정렬. 인덱스 커버리지 확인 필요
권장 사항:
- EXPLAIN PLAN으로 EXPLAIN 결과 확인: 인덱스 스캔 여부, 추가 필터링 필요 여부
- LIMIT/OFFSET 페이지네이션이 있으므로, offset이 클수록 성능 악화: 커서 기반 페이지네이션 검토

추가 테스트 권장 사항

장애 시나리오: 배치 중단 후 재시작 시 MV 테이블 부분 업데이트 상태에서의 조회 결과 일관성 검증
동시성: 배치 실행 중 API 조회가 MV 테이블을 읽는 경우, lock 없이 안전한지 확인
데이터 볼륨: 스트리밍 집계 시 매우 큰 단일 product의 이벤트 폭발 (1일 100만+ 이벤트) 시 메모리/시간 영향도 측정

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Copilot

Pull request overview

Spring Batch 기반의 롤링(어제 기준 LAST_7D/LAST_30D) 랭킹 파이프라인을 도입하고, API 조회 경로를 Redis → MV fallback(+전일 anchor 재시도) 구조로 전환해 스케줄러 기반 캘린더 집계의 편향/재시작 취약성을 해소하려는 PR입니다.

Changes:

commerce-batch에 rollingRankingJob(스테이징 집계 → score fan-out → MV promote → Redis refresh) 및 관련 도메인/인프라 레이어 추가
commerce-api 랭킹 조회를 LAST_7D/LAST_30D로 전환하고, Redis miss 시 MV 조회 + 전일 anchor retry fallback 추가
벤치마크 실행(task+스크립트) 및 재시작/통합/E2E 테스트 보강

Reviewed changes

Copilot reviewed 88 out of 90 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
week10/blog-final.md	설계/결과 문서(블로그) 최종본 추가/갱신
week10/blog-final-final.md	설계/결과 문서(블로그) 최종본 추가/갱신
scripts/measure-ranking-batch.sh	benchmarkTest 실행 및 결과 요약 출력 스크립트 추가
apps/commerce-batch/build.gradle.kts	benchmark 태그 테스트 분리(기본 test 제외) 및 benchmarkTest task 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/RollingRankingJobConfig.java	rollingRankingJob(0~6 Step 체인) 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RollingWindow.java	anchorDate 기반 롤링 윈도우 경계 모델 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RollingWindowResolver.java	anchorDateKey(yyyyMMdd) 파싱/검증 및 윈도우 생성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RankingJobParametersListener.java	Job 시작 시 anchor/window + weight snapshot을 ExecutionContext에 동결
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java	anchor별 staging 초기화(Tasklet) 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/AggregatedMetric.java	스트리밍 집계 결과 DTO 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/RawMetricRow.java	커서 기반 raw row DTO 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StreamingMetricAggregator.java	App streaming 집계기 추가(lookahead 포함)
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/ViewMetricStreamingReader.java	view metrics 커서 + streaming 집계 reader(+lookahead state)
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/LikeMetricStreamingReader.java	like metrics 커서 + streaming 집계 reader(+lookahead state)
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/OrderMetricStreamingReader.java	order metrics 커서 + streaming 집계 reader(+lookahead state)
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingAggregationProcessor.java	AggregatedMetric → LAST_7D/LAST_30D fan-out processor
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingLikeAggregationProcessor.java	like용 fan-out processor
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingOrderAggregationProcessor.java	order용 fan-out processor
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingViewMetricsWriter.java	staging aggregation(view) JDBC UPSERT writer
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingLikeMetricsWriter.java	staging aggregation(like) JDBC UPSERT writer
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingOrderMetricsWriter.java	staging aggregation(order) JDBC UPSERT writer
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageViewMetricsStepConfig.java	Step1(view) chunk step 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageLikeMetricsStepConfig.java	Step2(like) chunk step 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageOrderMetricsStepConfig.java	Step3(order) chunk step 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreFormula.java	score 공식 순수 함수 추가
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreProcessor.java	weight snapshot 기반 score fan-out processor
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/StagingScoredWriter.java	scored staging JDBC UPSERT writer
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepConfig.java	Step4(score) cursor reader + chunk step 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java	Step5: MV DELETE+INSERT + audit log 단일 TX promote
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvStepConfig.java	promoteTopToMvStep 구성
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java	Step6: MV → Redis ZSET shadow rebuild + RENAME refresh
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshStepConfig.java	redisRefreshStep 구성
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductMetricId.java	원천 metrics 테이블용 복합키 추가
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductViewMetric.java	view metrics 읽기 모델 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductLikeMetric.java	like metrics 읽기 모델 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductOrderMetric.java	order metrics 읽기 모델 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregation.java	1차 staging 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregationId.java	1차 staging 복합키 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregationRepository.java	1차 staging repo port 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScored.java	2차 staging 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScoredId.java	2차 staging 복합키 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScoredRepository.java	2차 staging repo port 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankId.java	MV 공통 복합키 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7d.java	MV(last7d) 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java	MV(last30d) 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7dRepository.java	MV(last7d) repo port 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30dRepository.java	MV(last30d) repo port 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/audit/BatchAuditLog.java	배치 실행 이력 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/audit/BatchAuditLogRepository.java	audit repo port 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfig.java	batch 측 weight_config 읽기 모델 엔티티 추가
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfigRepository.java	weight_config repo port 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/WeightConfigJpaRepository.java	weight_config JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/WeightConfigRepositoryImpl.java	weight_config repo adapter 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingAggregationJpaRepository.java	staging aggregation JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingAggregationRepositoryImpl.java	staging aggregation repo adapter 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingScoredJpaRepository.java	staging scored JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingScoredRepositoryImpl.java	staging scored repo adapter 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast7dJpaRepository.java	MV(last7d) JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast7dRepositoryImpl.java	MV(last7d) repo adapter 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast30dJpaRepository.java	MV(last30d) JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast30dRepositoryImpl.java	MV(last30d) repo adapter 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/BatchAuditLogJpaRepository.java	audit JPA repo 추가
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/BatchAuditLogRepositoryImpl.java	audit repo adapter 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RollingWindowResolverTest.java	윈도우 경계/검증 단위 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java	ExecutionContext 동결/재시작 단위 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StreamingMetricAggregatorTest.java	스트리밍 집계기 단위 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingStepIntegrationTest.java	Step0 초기화 통합 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StageViewMetricsStepIntegrationTest.java	Step1 통합 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StageMetricsPipelineIntegrationTest.java	Step1~3 파이프라인 병합 통합 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/score/ScoreFormulaTest.java	score 공식 단위 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepIntegrationTest.java	Step4(score) 통합 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java	Step0~6 E2E 테스트 추가(MV/audit/Redis)
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobRestartTest.java	chunk-mid restart 및 MV 보호 시나리오 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/Tier.java	시드 분포 tier enum 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/SeedSpec.java	시드 파라미터 record 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeeder.java	Zipf 기반 시드 생성기 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeederIntegrationTest.java	시드 분포/결정성/비율 통합 테스트 추가
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/measurement/RollingRankingJobBenchmark.java	S/M/L/XL_SPIKE 벤치마크 테스트 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/RankingPeriod.java	기간 enum을 LAST_7D/LAST_30D로 전환
apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingKeyResolver.java	rolling 키(anchor=어제) 생성 및 anchorDateOf 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvRankEntry.java	MV 조회 투영 DTO 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvRankingQueryRepository.java	MV 조회 repo port 추가
apps/commerce-api/src/main/java/com/loopers/infrastructure/ranking/MvRankingQueryRepositoryImpl.java	JdbcTemplate 기반 MV 조회 구현 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankId.java	commerce-api 측 MV PK 미러 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7d.java	commerce-api 측 MV(last7d) 미러 엔티티 추가
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java	commerce-api 측 MV(last30d) 미러 엔티티 추가
apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java	Redis miss 시 MV fallback + 전일 anchor retry 추가, totalCount도 MV로 fallback
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingKeyResolverTest.java	LAST_7D/LAST_30D 키 및 anchor 계산 테스트 추가/수정
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingServiceMvFallbackTest.java	MV fallback/전일 retry/totalCount fallback 단위 테스트 추가
apps/commerce-api/src/test/java/com/loopers/interfaces/api/ranking/RankingApiE2ETest.java	API E2E에 LAST_7D/LAST_30D 및 MV fallback 시나리오 추가

Comments suppressed due to low confidence (1)

apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingKeyResolver.java:19

RankingKeyResolver에 Clock을 주입받고 필드로 보관하지만 현재 클래스에서는 사용되지 않습니다. 불필요한 의존성은 테스트/구성 복잡도만 올리므로, Clock을 제거하거나(생성자/필드) 최소한 anchor 계산 등에서 실제로 사용하도록 정리해 주세요.

    private final Clock clock;

    public RankingKeyResolver(Clock clock) {
        this.clock = clock;
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T08:13:05Z

+    public static RollingWindow of(LocalDate anchorDate) {
+        LocalDateTime last7dStart  = anchorDate.minusDays(6).atStartOfDay();
+        LocalDateTime last7dEnd    = anchorDate.plusDays(1).atStartOfDay();
+        LocalDateTime last30dStart = anchorDate.minusDays(29).atStartOfDay();
+        LocalDateTime last30dEnd   = anchorDate.plusDays(1).atStartOfDay();


RollingWindow 경계를 LocalDate.atStartOfDay()로 계산하면 bucket_time이 UTC LocalDateTime인 현재 코드베이스 컨벤션과 어긋나 윈도우가 KST 기준으로 9시간 밀릴 수 있습니다. (apps/commerce-api/.../RankingDateUtils.java:15-25, apps/commerce-streamer/.../BucketTimeUtils.java:34-45 참고) anchorDate(KST date key)를 UTC boundary(LocalDateTime)로 변환한 값으로 last7d/last30d start/end를 계산하도록 수정해 주세요.

Copilot · 2026-04-17T08:13:06Z

+ * Step 7 — MV 의 확정된 TOP 100 을 Redis ZSET identity cache 로 복제한다.
+ *
+ * <p>Shadow key 에 ZADD 후 RENAME 으로 원자적 교체 → 조회 중 깜빡임 없음.
+ * Redis 는 MV 의 identity mirror (score·순서 동일) — 새 계산은 없다.</p>
+ *
+ * <p>Step 7 실패는 치명적이지 않음 — MV 자체는 영속되어 있고


JavaDoc에서 'Step 7'로 표기되어 있는데, 현재 Job 체인(RollingRankingJobConfig) 기준으로는 redisRefreshStep이 마지막 Step(0~6 기준 Step 6)입니다. Step 번호 표기가 여러 곳에서 설계/코드 리뷰 시 혼동을 유발하니 현재 체인 기준으로 맞춰 주세요.

Suggested change

* Step 7 — MV 의 확정된 TOP 100 을 Redis ZSET identity cache 로 복제한다.

*

* Shadow key 에 ZADD 후 RENAME 으로 원자적 교체 → 조회 중 깜빡임 없음.

* Redis 는 MV 의 identity mirror (score·순서 동일) — 새 계산은 없다.

*

* Step 7 실패는 치명적이지 않음 — MV 자체는 영속되어 있고

* Step 6 — MV 의 확정된 TOP 100 을 Redis ZSET identity cache 로 복제한다.

*

* Shadow key 에 ZADD 후 RENAME 으로 원자적 교체 → 조회 중 깜빡임 없음.

* Redis 는 MV 의 identity mirror (score·순서 동일) — 새 계산은 없다.

*

* Step 6 실패는 치명적이지 않음 — MV 자체는 영속되어 있고

coderabbitai

Actionable comments posted: 7

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java (1)
49-65: ⚠️ Potential issue | 🟠 Major

getTotalCount 과 fallbackFromMv 의 anchor retry 정책이 불일치해 페이지네이션이 깨진다.

운영 관점 시나리오다:

오늘 01:00 배치가 실패 또는 지연 → 오늘 anchor 의 MV 가 비어있다.

loadRankEntries → fallbackFromMv 는 전일 anchor 로 최대 3 일 retry 하여 어제 랭킹 TOP100 을 반환한다 (설계 의도대로 사용자 경험 유지).

그러나 getTotalCount 는 keyResolver.anchorDateOf(date) 단일 조회만 시도하므로 0 을 반환 한다.

결과:

API 는 items = [100건], totalCount = 0 같은 모순된 응답을 반환한다.

클라이언트 페이지네이션 UI 가 "총 0 건" 또는 totalPages=0 를 근거로 다음 페이지 요청을 차단하거나 "데이터 없음" 상태를 표시한다 — 배치 장애가 사용자에게 바로 노출된다.

과거 학습 (RankingFacade.java:92-96) 처럼 totalElements 의 근사성 자체는 용인되지만, 본 케이스는 "리스트는 보여주는데 total 은 0" 이라 "근사" 범위가 아니라 명백한 UX 버그다.

수정안은 getTotalCount 도 동일 retry 정책을 공유하는 헬퍼로 묶거나, 최소한 fallbackFromMv 가 실제 사용한 anchor 를 반환해 RankingFacade 에서 일관된 count 를 재조회하는 구조로 변경하는 것을 권고한다.
♻️ 제안 방향
 public long getTotalCount(RankingPeriod period, LocalDate date, String group) {
 String key = keyResolver.resolve(period, date, group);
 try {
 Long count = rankingRedisRepository.getTotalCount(key);
 if (count != null && count > 0) {
 return count;
 }
 } catch (Exception e) {
 log.warn("Redis totalCount 조회 실패: {}", e.getMessage());
 }
- // Redis miss 또는 0일 때 MV 카운트 fallback (LAST_7D / LAST_30D 만 해당)
- return switch (period) {
- case LAST_7D -> mvRankingQueryRepository.countLast7d(keyResolver.anchorDateOf(date), group);
- case LAST_30D -> mvRankingQueryRepository.countLast30d(keyResolver.anchorDateOf(date), group);
- default -> 0;
- };
+ // list 쪽 fallback 과 동일하게 최대 MV_FALLBACK_MAX_DAYS 일까지 anchor retry.
+ return switch (period) {
+ case LAST_7D, LAST_30D -> countFromMvWithRetry(period, date, group);
+ default -> 0L;
+ };
 }
+
+ private long countFromMvWithRetry(RankingPeriod period, LocalDate date, String group) {
+ LocalDate anchorDate = keyResolver.anchorDateOf(date);
+ for (int retry = 0; retry < MV_FALLBACK_MAX_DAYS; retry++) {
+ long c = switch (period) {
+ case LAST_7D -> mvRankingQueryRepository.countLast7d(anchorDate, group);
+ case LAST_30D -> mvRankingQueryRepository.countLast30d(anchorDate, group);
+ default -> 0L;
+ };
+ if (c > 0) return c;
+ anchorDate = anchorDate.minusDays(1);
+ }
+ return 0L;
+ }
추가 테스트로는 RankingServiceMvFallbackTest 에 "오늘 MV 비어있고 어제 MV 에 데이터 → list 는 100 건, totalCount 도 동일 anchor 로 > 0 반환" 시나리오를 포함할 것을 권고한다.

코딩 가이드라인 (유스케이스 단위로 책임이 정리되어 있는지, 부수 효과가 명확한지 점검한다 ... 실패 시 대체 흐름을 제안한다) 에 따라 제안한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java`
around lines 49 - 65, getTotalCount currently only queries the single anchor
from keyResolver.anchorDateOf(date) while fallbackFromMv retries up to 3 prior
anchors, causing list/totalCount mismatch; fix by extracting a shared
anchor-resolve fallback helper (e.g., resolveAnchorWithRetries or an mvFallback
that returns the actual anchor used) and have both getTotalCount and
loadRankEntries/fallbackFromMv call it so they use the identical anchor when
falling back to mvRankingQueryRepository.countLast7d/countLast30d; update
RankingFacade usage if it relies on anchors and add a test in
RankingServiceMvFallbackTest verifying the scenario "today MV empty, yesterday
MV has data => list size and totalCount use same anchor and >0".

🟡 Minor comments (17)

apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java-30-37 (1)
30-37: ⚠️ Potential issue | 🟡 Minor

anchorDateKey 누락/빈 값 방어가 없다.

운영 관점 문제: @Value("#{jobExecutionContext['...']}") 바인딩은 키 부재 시 null 이 주입되어 deleteByPeriodKey(null) 로 전달된다. JPA 구현의 WHERE period_key = ? 바인딩에 따라 암묵적으로 "아무것도 삭제하지 않음"이 되어, Step 0 가 성공 상태로 끝나지만 이전 시도의 스테이징 잔재가 남은 채 이후 Step 이 실행되어 중복 집계/순위 오염으로 이어질 수 있다. RankingJobParametersListener 가 정상 경로에서 값을 넣어준다고 해도, Listener 순서 오설정 같은 회귀에 대한 fail-fast 가 없다.

수정안: execute 진입 시 Assert.hasText(anchorDateKey, ...) 또는 IllegalStateException 으로 즉시 실패 처리한다.

추가 테스트: TruncateStagingStepIntegrationTest 에 "jobExecutionContext 에 anchorDateKey 가 없을 때 Step 이 FAILED 로 종료된다" 케이스를 추가한다.
🛡️ 제안 수정안
 public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
+ if (anchorDateKey == null || anchorDateKey.isBlank()) {
+ throw new IllegalStateException(
+ "anchorDateKey missing in jobExecutionContext: " + RankingJobParametersListener.CTX_ANCHOR_DATE_KEY);
+ }
 int aggregationDeleted = aggregationRepository.deleteByPeriodKey(anchorDateKey);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java`
around lines 30 - 37, Add a guard in TruncateStagingTasklet.execute to fail fast
if the injected anchorDateKey is null/empty: at method start validate
anchorDateKey (e.g., Assert.hasText(anchorDateKey, "...") or throw new
IllegalStateException(...)) so deleteByPeriodKey is never called with a missing
key; mention RankingJobParametersListener as the expected provider of the key.
Also add an integration test TruncateStagingStepIntegrationTest that asserts the
Step transitions to FAILED when jobExecutionContext lacks anchorDateKey.
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingKeyResolverTest.java-68-74 (1)
68-74: ⚠️ Potential issue | 🟡 Minor

테스트 의도와 명칭이 불일치한다.

테스트 이름은 월_경계를_걸쳐도_음수_날짜없이_안전하게_계산된다이지만 실제 입력은 2026-01-01 → 2025-12-31로 연(年) 경계를 검증한다. 또한 LocalDate.minusDays(1)은 결코 음수 컴포넌트를 만들지 않으므로 "음수_날짜없이"라는 표현은 검증 의도를 오도한다. 향후 회귀 테스트가 잘못된 보장(LocalDate가 음수가 될 수 있다는 오해)을 가져갈 우려가 있다.

추가로 anchor_date_계산 Nested에는 1월 1일/3월 1일(윤년 직후) 경계 케이스가 없다. anchorDateOf도 동일한 경계 케이스를 추가하는 편이 일관적이다.
♻️ 제안 수정안
- `@Test`
- void 월_경계를_걸쳐도_음수_날짜없이_안전하게_계산된다() {
+ `@Test`
+ void 연도_경계를_걸쳐도_전년도_anchor_로_계산된다() {
 // 조회 기준일 2026-01-01 → anchor_date = 2025-12-31
 String key = resolver.resolve(RankingPeriod.LAST_7D, LocalDate.of(2026, 1, 1), "control");

 assertThat(key).isEqualTo("ranking:last7d:20251231:control");
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingKeyResolverTest.java`
around lines 68 - 74, Rename the test and adjust/add cases to match the actual
boundary being verified: change the test name from 월_경계를_걸쳐도_음수_날짜없이_안전하게_계산된다
to something like 연_경계를_걸쳐_올바른_anchor_date를_계산한다 and ensure the assertion still
calls resolver.resolve(RankingPeriod.LAST_7D, LocalDate.of(2026,1,1), "control")
expecting "ranking:last7d:20251231:control"; add additional unit tests in the
anchor_date_계산 nested group and corresponding checks for anchorDateOf to cover
Jan 1 (year boundary) and Mar 1 (post-leap-year boundary) inputs; update or add
tests rather than implying LocalDate can be negative and keep references to
resolver.resolve and anchorDateOf to locate the code.
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingServiceMvFallbackTest.java-103-122 (1)
103-122: ⚠️ Potential issue | 🟡 Minor

재시도 횟수와 비-fallback 경로에 대한 검증이 누락되어 있다.

운영 관점에서 두 가지 보강이 필요하다.

전일_fallback_도_3일간_비어있으면_빈_리스트를_반환한다는 모든 anchor에 대해 동일한 stub을 걸어 빈 결과를 단언한다. 하지만 실제로 구현이 1회만 시도해도 테스트가 통과한다. PR 목표에 명시된 "최대 3일 자동 retry" SLA를 보장하려면 호출 횟수를 명시적으로 검증해야 한다(Mockito.verify(mvRepository, times(3)).findLast7d(...)). 그렇지 않으면 향후 retry 한도가 1로 회귀해도 테스트가 잡지 못한다.

totalCount_도_Redis_miss_시_MV_카운트로_fallback_한다의 짝이 되는 양(陽)성 케이스(Redis가 양수를 반환하면 MV는 호출되지 않음)가 빠져 있다. 양 방향이 모두 있어야 fallback 분기가 안전하게 보호된다.
♻️ 추가 테스트 제안
 `@Test`
 void 전일_fallback_도_3일간_비어있으면_빈_리스트를_반환한다() {
 when(redisRepository.getRankings(anyString(), anyInt(), anyInt())).thenReturn(List.of());
 when(mvRepository.findLast7d(any(), anyString(), anyInt(), anyInt())).thenReturn(List.of());

 List<RankEntry> result = service.getRankEntries(
 RankingPeriod.LAST_7D, LocalDate.of(2026, 4, 15), 0, 20, "control");

 assertThat(result).isEmpty();
+ // anchor=4/14, 4/13, 4/12 까지 정확히 3회 시도되어야 한다
+ Mockito.verify(mvRepository, Mockito.times(3))
+ .findLast7d(any(), anyString(), anyInt(), anyInt());
+ Mockito.verify(mvRepository).findLast7d(eq(LocalDate.of(2026, 4, 12)), anyString(), anyInt(), anyInt());
 }

+ `@Test`
+ void totalCount_는_Redis_가_응답하면_MV_를_호출하지_않는다() {
+ when(redisRepository.getTotalCount(anyString())).thenReturn(42L);
+
+ long total = service.getTotalCount(RankingPeriod.LAST_7D, LocalDate.of(2026, 4, 15), "control");
+
+ assertThat(total).isEqualTo(42L);
+ Mockito.verifyNoInteractions(mvRepository);
+ }
As per coding guidelines: "단위 테스트는 경계값/실패 케이스/예외 흐름을 포함하는지 점검한다".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingServiceMvFallbackTest.java`
around lines 103 - 122, Add explicit verifications and the missing positive
case: in the test 전일_fallback_도_3일간_비어있으면_빈_리스트를_반환한다(), stub
mvRepository.findLast7d(...) to return empty as now but also verify that
mvRepository.findLast7d(...) is invoked exactly 3 times
(Mockito.verify(mvRepository, times(3)).findLast7d(...)) to enforce the 3-day
retry behavior for service.getRankEntries(...); and add a new positive test for
totalCount_도_Redis_miss_시_MV_카운트로_fallback_한다 that stubs
redisRepository.getTotalCount(...) to return a positive value and asserts
service.getTotalCount(...) returns that value while verifying
mvRepository.countLast7d(...) is never called (Mockito.verify(mvRepository,
never()).countLast7d(...)).
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfig.java-53-63 (1)
53-63: ⚠️ Potential issue | 🟡 Minor

감사 타임스탬프와 trafficPct에 대한 입력 방어가 부족하다.

운영 관점에서 두 가지 문제가 있다.

ZonedDateTime.now()는 JVM 기본 타임존을 사용한다. 배치/API/스트리머가 서로 다른 타임존(예: 컨테이너 UTC vs 운영 KST)에서 기동될 경우 동일 의미의 시점이 다른 오프셋으로 기록되어 감사·재현이 어려워진다. Clock 주입 또는 명시적 ZoneId(예: Asia/Seoul) 지정이 안전하다.

trafficPct는 0~100 범위가 의미적 전제이지만 검증이 없다. 잘못된 값(음수, 100 초과, 합계 100 초과)이 들어가면 fan-out 및 트래픽 분배 로직이 조용히 망가진다.

추가 테스트로 (a) 같은 인스턴트에 대해 타임존 무관하게 동일하게 직렬화/비교되는지, (b) trafficPct < 0 / > 100 입력에 대한 거부 케이스가 필요하다.
♻️ 제안 수정안
- public WeightConfig(String groupName, double wView, double wLike, double wOrder,
- int trafficPct, boolean active) {
+ public WeightConfig(String groupName, double wView, double wLike, double wOrder,
+ int trafficPct, boolean active) {
+ if (trafficPct < 0 || trafficPct > 100) {
+ throw new IllegalArgumentException("trafficPct must be within [0, 100]: " + trafficPct);
+ }
 this.groupName = groupName;
 this.wView = wView;
 this.wLike = wLike;
 this.wOrder = wOrder;
 this.trafficPct = trafficPct;
 this.active = active;
- this.createdAt = ZonedDateTime.now();
- this.updatedAt = ZonedDateTime.now();
+ ZonedDateTime now = ZonedDateTime.now(ZoneId.of("Asia/Seoul"));
+ this.createdAt = now;
+ this.updatedAt = now;
 }
As per coding guidelines: "도메인 규칙과 인프라 관심사가 섞이면 분리하도록 제안한다" 및 "불변성과 캡슐화를 점검한다".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfig.java`
around lines 53 - 63, The WeightConfig constructor currently sets
createdAt/updatedAt with ZonedDateTime.now() and lacks validation for
trafficPct; change the constructor to accept a Clock (or explicit ZoneId) and
use ZonedDateTime.now(clock/zone) to produce timezone-stable timestamps for
createdAt and updatedAt, and add input validation on trafficPct in the
WeightConfig constructor (and any factory methods) to enforce 0 <= trafficPct <=
100 by throwing an IllegalArgumentException for invalid values; update any
builders/creators that call new WeightConfig(...) to supply the Clock/ZoneId (or
provide an overloaded constructor/factory) and add unit tests asserting
timestamp determinism across zones and that trafficPct out-of-range inputs are
rejected.
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java-91-94 (1)
91-94: ⚠️ Potential issue | 🟡 Minor

MV 가 비어 있을 때 shadow 키가 정리되지 않는다.

빈 결과 분기에서 대상 키만 delete(key) 하고 이전 실행에서 남았을 수 있는 <key>:rebuild 는 정리하지 않는다. 운영 관점에서 앞선 실행이 ZADD 후 RENAME 전에 실패했다면 shadow 가 남아 있을 수 있고, 이후 MV 가 비는 anchorDate 실행에서는 그 shadow 가 방치된 채 유지되어 용량 누수와 진단 혼선을 일으킨다. 빈 결과 분기에서도 shadow 를 함께 삭제하고, 해당 시나리오에 대한 통합 테스트를 추가하는 것을 권장한다.
🛡️ 제안 수정
 if (rows.isEmpty()) {
 redisTemplate.delete(key);
+ redisTemplate.delete(key + ":rebuild");
 return 0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java`
around lines 91 - 94, 빈 MV 분기에서 shadow 키(<key>:rebuild)가 정리되지 않아 용량 누수 문제가 발생하므로
RedisRefreshTasklet의 빈 결과 분기에서 현재 delete(key)만 하는 대신 shadow 키도 함께 삭제하도록 변경하고, 해당
시나리오를 검증하는 통합 테스트를 추가하세요; 구체적으로 RedisRefreshTasklet 내 해당 if (rows.isEmpty())
블록에서 rebuildKey = key + ":rebuild" (혹은 기존 코드에서 사용하는 동일한 접미사) 를 생성해
redisTemplate.delete(key) 호출과 함께 redisTemplate.delete(rebuildKey)를 호출하도록 수정하고,
이전 실행이 ZADD 후 RENAME 전에 실패해 shadow가 남아있는 경우를 시뮬레이트해 빈 MV 호출 시 두 키가 모두 제거되는 통합
테스트를 작성하세요.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java-70-91 (1)
70-91: ⚠️ Potential issue | 🟡 Minor

재시작 시나리오 검증의 커버리지를 확장한다

현재 재시작 테스트는 CTX_ANCHOR_DATE_KEY와 CTX_LAST_7D_START 두 키만 보존 여부를 검증한다. 운영 관점에서 "이미 동결된 스냅샷"은 5개 키(7d start/end, 30d start/end, anchor key) 전체가 한 묶음이므로, 일부만 보존되고 나머지는 덮어써지는 부분 동결 버그를 놓칠 수 있다. 또한 현재 테스트는 "값을 미리 넣지 않은 키"에 대해 새로 채워졌는지(예: CTX_LAST_30D_START/END, CTX_LAST_7D_END)도 검증하지 않아, beforeJob이 재시작 시 아예 no-op이 되는 회귀도 잡지 못한다.
💚 테스트 보강안
- assertAll(
- () -> assertThat(ctx.getString(RankingJobParametersListener.CTX_ANCHOR_DATE_KEY)).isEqualTo("20260101"),
- () -> assertThat(ctx.getString(RankingJobParametersListener.CTX_LAST_7D_START)).isEqualTo("2025-12-26T00:00")
- );
+ assertAll(
+ // 이미 존재하던 값은 보존된다
+ () -> assertThat(ctx.getString(RankingJobParametersListener.CTX_ANCHOR_DATE_KEY)).isEqualTo("20260101"),
+ () -> assertThat(ctx.getString(RankingJobParametersListener.CTX_LAST_7D_START)).isEqualTo("2025-12-26T00:00"),
+ // 누락되어 있던 키는 첫 실행 시점 기준이 아닌, 이미 부분 동결된 스냅샷의 무결성 정책에 따라 검증
+ () -> assertThat(ctx.containsKey(RankingJobParametersListener.CTX_LAST_7D_END)).isTrue(),
+ () -> assertThat(ctx.containsKey(RankingJobParametersListener.CTX_LAST_30D_START)).isTrue(),
+ () -> assertThat(ctx.containsKey(RankingJobParametersListener.CTX_LAST_30D_END)).isTrue()
+ );
부분 동결 정책(전부 있으면 skip / 하나라도 없으면 재계산)이 있다면 그 정책을 별도 테스트로 명문화한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java`
around lines 70 - 91, Update the restart-scenario tests in
RankingJobParametersListenerTest: extend the existing
ExecutionContext_에_이미_값이_있으면_덮어쓰지_않는다 test to pre-populate all five
snapshot-related keys (RankingJobParametersListener.CTX_ANCHOR_DATE_KEY,
CTX_LAST_7D_START, CTX_LAST_7D_END, CTX_LAST_30D_START, CTX_LAST_30D_END) and
assert they are all preserved after listener.beforeJob; add a complementary test
that leaves one or more of those keys absent and asserts beforeJob
computes/fills the missing keys (e.g., CTX_LAST_30D_START/END and
CTX_LAST_7D_END) from the JobParameters so the “all-or-nothing” freeze policy is
covered; locate logic in RankingJobParametersListener.beforeJob when wiring
assertions.
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepConfig.java-48-55 (1)
48-55: ⚠️ Potential issue | 🟡 Minor

WHERE 절 필터 조건을 명확히 하고 인덱스를 확인한다

TruncateStagingTasklet이 deleteByPeriodKey로 aggregation과 scored 테이블을 함께 정리하므로, 재시작 케이스에서 double-count 위험은 없다. 다만 다음 사항은 운영 관점에서 최적화할 필요가 있다.

현재 쿼리가 period_key만으로 필터링한 후 ORDER BY period_type, product_id로 정렬하는데, staging_ranking_aggregation의 복합 PK가 (period_type, period_key, product_id)이므로 인덱스의 첫 컬럼인 period_type이 활용되지 않을 수 있다. WHERE 절을 period_key = ? AND period_type IN ('LAST_7D', 'LAST_30D')로 변경하거나, 이를 다루는 별도의 인덱스 존재 여부를 DB 관리자와 확인한다.

성능 테스트를 통해 대량 데이터 상황에서 스캔 비용이 허용 수준인지 검증한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepConfig.java`
around lines 48 - 55, The SQL in ScoreAggregationStepConfig currently filters
only by period_key which may prevent use of the composite PK (period_type,
period_key, product_id); update the query in the .sql(...) block (and its
preparedStatementSetter) to include an explicit period_type filter (e.g. "WHERE
period_key = ? AND period_type IN ('LAST_7D','LAST_30D')") or work with the DBA
to add/confirm an index that supports queries by period_key alone; ensure the
preparedStatementSetter sets any additional parameter(s) accordingly and run
performance tests to validate scan cost under large data volumes.
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java-56-74 (1)
56-74: ⚠️ Potential issue | 🟡 Minor

MV 교체 TX 내 createdAt 일관성은 보장되나, 엔티티 설계와 구현의 불일치 해소 필요

createdAt을 생성자 인자로 설계한 의도는 명확하다. 다만 현재 Tasklet 구현에서는 직접 SQL(JdbcTemplate.update())로 데이터를 INSERT하고 있어 생성자가 호출되지 않는다(라인 89에서 Timestamp createdAt을 1회 캡처하여 루프 내 promote() 메서드에 전달하므로 created_at 산재 문제는 발생하지 않음).

그러나 엔티티가 매개변수화 생성자로 createdAt 주입을 정의했다면, 실제 MV 로드도 생성자를 통해 수행하거나 또는 생성자에서 매개변수를 제거하고 @CreationTimestamp 또는 데이터베이스 기본값으로 관리하도록 정렬해야 한다. 현재의 설계-구현 불일치는 도메인 모델과 인프라 로직의 경계를 모호하게 하므로, 다음 중 하나로 통일하도록 제안한다:

생성자 파라미터 제거 → JPA가 @CreationTimestamp 또는 DB 기본값으로 관리

또는 Tasklet에서 생성자 호출로 변경 → 도메인 모델을 통한 일관된 객체 생성
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java`
around lines 56 - 74, The entity MvProductRankLast30d declares createdAt as a
constructor parameter but the Tasklet currently inserts rows via
JdbcTemplate.update(...) and calls promote() with a captured Timestamp, causing
a model/infrastructure mismatch; choose one approach and align code accordingly:
either remove the createdAt parameter from MvProductRankLast30d and annotate the
field with `@CreationTimestamp` (or rely on DB default) so JPA/db sets it, or
change the Tasklet to instantiate MvProductRankLast30d via its parameterized
constructor (passing the captured Timestamp/LocalDateTime) and persist through
JPA instead of raw JdbcTemplate.update; update usages of promote(), the
constructor, and persistence code to match the selected strategy
(MvProductRankLast30d, createdAt, promote(), Tasklet, JdbcTemplate.update()).
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobRestartTest.java-160-183 (1)
160-183: ⚠️ Potential issue | 🟡 Minor

테스트 이름과 실제 검증 내용이 불일치한다.

테스트 이름은 Step5_완주_후_Step5b_도_완주하면_2차_staging_과_MV_모두_적재된다 이지만, 주석 (167-172) 에도 서술된 것처럼 Step 5 에 대한 실패 주입이 전혀 없고 단순히 happy path 를 한 번 돌리는 것에 불과하다. 이 상태는 다음 문제를 유발한다:

"Step5_Score_실패" nested class 아래에 묶여 있지만 실제로는 실패 시나리오가 아니다. 테스트 리포트만 보면 Step 5 재시작 회복이 검증된 것처럼 오인된다.

리팩터로 Step 5 가 조용히 skip 되거나 no-op 이 되어도 본 테스트는 계속 그린이다 (회귀 탐지력 없음).

@StepScope Tasklet 에 SpyBean throw 가 어려우면 다음 대안을 권고한다:

MV 테이블을 @Sql / jdbcTemplate.execute("DROP TABLE mv_product_rank_last_7d") 로 사전에 제거해 Step 5 의 INSERT 를 강제 실패시키고, first.getStatus() == FAILED + countByAnchorDate == 0 를 검증한 후 테이블 복구 + 재실행으로 COMPLETED 를 검증.

또는 JdbcTemplate 을 SpyBean 으로 감싸 update(startsWith("INSERT INTO mv_product_rank")) 1 회 throw 주입.

최소한 현재 상태로 유지한다면 클래스 위치를 Step5_정상_완주 같은 별도 nested 로 옮기고 이름에서 "완주_후_Step5b_도_완주하면" 뉘앙스를 제거해야 한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobRestartTest.java`
around lines 160 - 183, The test method
Step5_완주_후_Step5b_도_완주하면_2차_staging_과_MV_모두_적재된다 currently does a happy-path run
but is placed under a failure-focused nested class and therefore misrepresents
behavior; fix it by making the test actually exercise a Step5 failure and
restart: before the first jobLauncher.run(job, paramsOf(ANCHOR_KEY, 3L)) call
use JdbcTemplate (or `@Sql`) to DROP the MV table (e.g., DROP TABLE
mv_product_rank_last_7d) so the first JobExecution is FAILED, assert
exec.getStatus()==FAILED and last7dRepository.countByAnchorDate(ANCHOR)==0 while
stagingScoredRepository.countByPeriodKey(ANCHOR_KEY) shows the staging load,
then recreate the MV table, rerun jobLauncher.run(job, paramsOf(...)) and assert
COMPLETED and last7dRepository.countByAnchorDate(ANCHOR)==5; alternatively if
you do not implement failure injection, move/rename the test out of the Step5
failure nested class (e.g., to Step5_정상_완주) and remove misleading assertions.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeeder.java-123-125 (1)
123-125: ⚠️ Potential issue | 🟡 Minor

(int) r[2] 캐스트는 조용한 오버플로 위험을 남긴다.

ps.setInt(3, (int) r[2]) 는 long → int 내로잉 캐스트로, 값이 Integer.MAX_VALUE (≈ 2.1e9) 를 넘으면 아무 경고 없이 음수/왜곡된 값이 들어간다. 현재 시드 스펙상 한 5-분 bucket 의 order_count 는 수십 단위라 당장은 터지지 않지만,

XL 벤치마크 시나리오에서 scaleC 를 올리거나,

추후 seed 생성 로직을 "5-분 bucket 에 하루치를 한 번에 몰아넣기" 로 바꾸는 리팩터가 들어오면

파라미터 바인딩 단계에서 조용히 파괴된다. 오버플로 방지와 order_count 컬럼이 INT 가 아닌 BIGINT 로 바뀌었을 때의 호환성을 위해 setLong 일원화 (컬럼이 INT 여도 드라이버가 넘어가는 long 을 클라이언트 측 검증 경로로 넘김) 또는 Math.toIntExact((long) r[2]) 로 명시적 오버플로 예외 처리를 권고한다.
♻️ 제안 예시
- ps.setInt(3, (int) r[2]);
+ ps.setInt(3, Math.toIntExact(r[2]));
추가 테스트로는 SeedSpec(10_000_000, ...) 같은 XL seed 에서 batchUpdate 가 성공하거나 명시적 ArithmeticException 을 던지는지 확인하는 경계 테스트가 있으면 좋다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeeder.java`
around lines 123 - 125, The cast (int) r[2] in BaselineSeeder silently risks
integer overflow when binding with ps.setInt(3, (int) r[2]); change to a safe
approach: either use ps.setLong(3, (long) r[2]) consistently for order_count
binding (unify with the other setLong calls) or replace the cast with
Math.toIntExact((long) r[2]) to trigger an explicit ArithmeticException on
overflow; update the binding site(s) referencing r[2] and add/adjust a boundary
test (e.g., SeedSpec with very large counts) to assert that batchUpdate either
succeeds with long binding or fails with a clear exception when using
Math.toIntExact.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingStepIntegrationTest.java-87-105 (1)
87-105: ⚠️ Potential issue | 🟡 Minor

테스트 이름이 "멱등성" 이지만 실제로는 서로 다른 JobInstance 에서 연속 실행하는 시나리오다.

paramsOf 가 System.nanoTime() 을 runTimestamp 파라미터에 넣으므로 first / second 는 서로 다른 JobInstance 이다. 이는 Spring Batch 의 "재시작 멱등성" (같은 JobInstance 의 재시작) 이 아니라 단지 "다른 실행도 같은 결과를 덮어쓰기 없이 잘 돌린다" 를 검증한다. RollingRankingJobRestartTest 에서는 의도적으로 같은 runTimestamp 를 쓰고 있음과 대비된다.

진짜 멱등성을 검증하려면:

같은 paramsOf(anchor) 를 공유 변수에 저장해 두 실행에 동일하게 전달 (현재 System.nanoTime() 이라 매 호출마다 달라짐).

또는 테스트 이름을 같은_anchorDate_로_다른_JobInstance_를_두번_돌려도_결과가_동일하다 로 변경.

운영 관점에서 전자가 중요한 이유: 실제 장애 시 재시작 케이스는 같은 JobInstance 의 FAILED → restart 이므로, 그 경로에서 Truncate 가 중복 DELETE 해도 문제 없는지 테스트가 남아있어야 회귀 탐지가 된다.
♻️ 제안 예시
- jobLauncherTestUtils.setJob(job);
-
- JobExecution first = jobLauncherTestUtils.launchJob(paramsOf(anchor));
- // 재실행을 위해 새 JobInstance 로 실행 (runTimestamp 로 격리)
- JobExecution second = jobLauncherTestUtils.launchJob(paramsOf(anchor));
+ jobLauncherTestUtils.setJob(job);
+ JobParameters fixedParams = new JobParametersBuilder()
+ .addString(RankingJobParametersListener.PARAM_ANCHOR_DATE, anchor)
+ .addLong("runTimestamp", 42L)
+ .toJobParameters();
+ JobExecution first = jobLauncherTestUtils.launchJob(fixedParams);
+ // 같은 JobInstance 로 재실행 → 직전 성공 시 BatchStatus 는 이미 COMPLETED.
+ // 진짜 멱등성 회귀 탐지는 실패 주입 후 restart 시나리오에서 다루는 것이 더 적합하다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingStepIntegrationTest.java`
around lines 87 - 105, The test is claiming idempotency but creates two
different JobInstances because paramsOf(anchor) embeds System.nanoTime(); fix by
computing the JobParameters once and reusing it for both launches so the second
launch is a restart of the same JobInstance: call JobParameters params =
paramsOf(anchor) (or store the result of paramsOf(anchor) in a local var) and
pass that same params to both jobLauncherTestUtils.launchJob(params) calls
(update the variables JobExecution first/second and keep assertions), or
alternatively rename the test to reflect that it currently exercises two
different JobInstances if you prefer the other behavior.
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java-88-92 (1)
88-92: ⚠️ Potential issue | 🟡 Minor

writeCount 메트릭에 DELETE 건수까지 합산되어 운영 지표가 왜곡된다.

contribution.incrementWriteCount(deleted7d + deleted30d + totalInserted) 로 삭제와 삽입을 한 카운터에 섞으면, 운영 대시보드/알림에서 "해당 앵커의 TOP N INSERT 건수"로 쓰이는 BATCH_STEP_EXECUTION.WRITE_COUNT 가 삭제량에 비례해 과대집계된다. 전일자 MV 삭제량이 그룹 수에 비례해 수백 건까지 커지면 스파이크 탐지가 왜곡될 위험이 있다.
🔧 제안 변경
- contribution.incrementWriteCount(deleted7d + deleted30d + totalInserted);
+ contribution.incrementWriteCount(totalInserted);
deleted7d/deleted30d 는 별도 로그/커스텀 메트릭(예: meterRegistry.counter("batch.mv.delete", ...))으로 분리하고, totalInserted 만 writeCount 에 반영하는 것이 운영 관점에서 해석이 명확해진다. 추가 테스트로는 StepExecution.getWriteCount() 가 "활성 그룹 수 × 2(7d/30d) × TOP_N" 과 일치함을 E2E 에서 단언하는 케이스를 권장한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java`
around lines 88 - 92, The current code increments the Spring Batch writeCount
with deletions as well (contribution.incrementWriteCount(deleted7d + deleted30d
+ totalInserted)), which inflates BATCH_STEP_EXECUTION.WRITE_COUNT; change it to
only increment writeCount with totalInserted and emit/delete metrics for
deletions separately (e.g., use meterRegistry.counter or a dedicated log for
deleted7d and deleted30d), so keep
contribution.incrementWriteCount(totalInserted) and record deleted7d/deleted30d
to a separate metric or log; update any tests to assert
StepExecution.getWriteCount() equals expected inserted count rather than
inserted+deleted.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/measurement/RollingRankingJobBenchmark.java-125-150 (1)
125-150: ⚠️ Potential issue | 🟡 Minor

벤치마크 결과 파일 기록에 Charset/병렬 안전성 문제가 있다.

Files.writeString(outPath, line, CREATE, APPEND) 는 Charset 를 명시하지 않아 JVM 기본(대부분 UTF-8 이지만 일부 환경에서 US-ASCII/MS949)에 의존하며, 한글 라벨/포맷이 들어가는 경우 CI 환경에 따라 깨질 수 있다. 또한 Gradle 의 maxParallelForks>1 환경에서는 동일 파일에 대한 APPEND 가 OS 레벨로 원자적이지 않아 두 테스트의 라인이 섞일 수 있다. 벤치마크 결과가 shell script 로 파싱되는 설계이므로 이 두 지점은 파싱 오류/착시를 유발한다.
🔧 제안 변경
- java.nio.file.Files.writeString(outPath, line,
- java.nio.file.StandardOpenOption.CREATE,
- java.nio.file.StandardOpenOption.APPEND);
+ java.nio.file.Files.writeString(outPath, line,
+ java.nio.charset.StandardCharsets.UTF_8,
+ java.nio.file.StandardOpenOption.CREATE,
+ java.nio.file.StandardOpenOption.APPEND);
추가로 Gradle benchmarkTest task 에서 maxParallelForks = 1 을 강제하거나, 테스트 단위로 benchmark.outputFile 에 PID/테스트명 접미사를 붙이는 것을 권장한다. 추가 테스트로는 S/M/L/XL 여러 케이스 동시 실행 시 결과 라인 수와 BENCH| 프리픽스 무결성을 검증하는 smoke 테스트를 고려한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/measurement/RollingRankingJobBenchmark.java`
around lines 125 - 150, The benchmark output currently uses
Files.writeString(outPath, line, CREATE, APPEND) without a Charset and uses a
shared output path causing encoding issues and interleaving when tests run in
parallel; change the write to specify StandardCharsets.UTF_8 and avoid
concurrent appends by making per-process/test files (e.g. derive the path from
System.getProperty("benchmark.outputFile", "build/benchmark-results.txt") and
append a unique suffix like ProcessHandle.current().pid() or the test label) or
alternatively create a temp file per run and atomically move it into place;
update references around outPath, line, System.getProperty(...), and
Files.writeString to use the UTF-8 charset and the unique filename approach (or
temp+atomic move) to ensure encoding correctness and parallel-safety.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java-70-74 (1)
70-74: ⚠️ Potential issue | 🟡 Minor

@AfterEach 만으로는 첫 테스트의 클린 상태를 보장하지 못한다.

첫 테스트 실행 전에는 teardown 이 동작하지 않으므로, 동일 Gradle 실행 내에서 다른 테스트 클래스가 남긴 mv_product_rank_* / Redis key 가 남아 있으면 countByAnchorDate(ANCHOR) 가 비결정적으로 실패할 수 있다. 플래키 가능성을 줄이려면 @BeforeEach 에도 동일한 cleanup 을 수행하거나, 클래스 레벨 @Transactional 대신 명시적 pre-clean 을 두는 편이 안전하다.
🔧 제안 변경
+ `@org.junit.jupiter.api.BeforeEach`
+ void setUp() {
+ databaseCleanUp.truncateAllTables();
+ redisCleanUp.truncateAll();
+ }
+
 `@AfterEach`
 void tearDown() {
 databaseCleanUp.truncateAllTables();
 redisCleanUp.truncateAll();
 }
동일 이슈가 StageMetricsPipelineIntegrationTest, ScoreAggregationStepIntegrationTest, RollingRankingJobBenchmark 에도 공통이므로 일괄 적용을 권장한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java`
around lines 70 - 74, 현재 RollingRankingJobE2ETest 클래스는 `@AfterEach의`
tearDown()만으로만 정리하므로 첫 테스트 실행 전 다른 테스트(같은 Gradle 세션)에서 남긴 mv_product_rank_*
테이블/Redis 키가 남아 플래키가 발생할 수 있습니다; RollingRankingJobE2ETest의 tearDown()에서 호출하는
databaseCleanUp.truncateAllTables()와 redisCleanUp.truncateAll()를 동일하게
`@BeforeEach` 메서드(예: setUpCleanState())에도 호출하도록 추가하거나, `@BeforeEach에서` 명시적
pre-clean을 실행하도록 변경해서 countByAnchorDate(ANCHOR) 같은 검증이 항상 깨끗한 상태에서 시작되게 하세요; 동일
변경은 StageMetricsPipelineIntegrationTest, ScoreAggregationStepIntegrationTest,
RollingRankingJobBenchmark 클래스에도 일괄 적용하십시오.
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java-162-175 (1)
162-175: ⚠️ Potential issue | 🟡 Minor

빈 원천 시나리오의 대칭성이 부족해 회귀 감지 범위가 좁다.

현재 단언은 ranking:last7d:...:control 만 hasKey().isFalse() 로 확인하고 있다. 실제로는 ranking:last30d:...:control 도 동일하게 없어야 하며, audit 로그 역시 "빈 원천에서 MV INSERT=0 인 경우의 기록 정책"(예: STATUS_OK 0건 기록 or 미기록)이 회귀되면 모니터링 알림이 잘못 울릴 수 있다. 운영/장애 관점에서 빈 원천은 사고 시그널과 구분되어야 하므로 단언 범위를 넓히는 것을 권장한다.
🔧 제안 추가 단언
 assertAll(
 () -> assertThat(execution.getStatus()).isEqualTo(BatchStatus.COMPLETED),
 () -> assertThat(last7dRepository.countByAnchorDate(ANCHOR)).isZero(),
 () -> assertThat(last30dRepository.countByAnchorDate(ANCHOR)).isZero(),
- () -> assertThat(redisTemplate.hasKey("ranking:last7d:" + ANCHOR_KEY + ":control")).isFalse()
+ () -> assertThat(redisTemplate.hasKey("ranking:last7d:" + ANCHOR_KEY + ":control")).isFalse(),
+ () -> assertThat(redisTemplate.hasKey("ranking:last30d:" + ANCHOR_KEY + ":control")).isFalse(),
+ // 빈 원천에서 audit 기록 정책(현재 구현: MV INSERT=0 일 때도 OK 기록) 고정
+ () -> assertThat(auditLogRepository.findByAnchorDate(ANCHOR))
+ .extracting(BatchAuditLog::getStatus)
+ .allMatch(s -> BatchAuditLog.STATUS_OK.equals(s))
 );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java`
around lines 162 - 175, The test method 원천이_비어_있어도_Job_은_성공한다() only asserts the
absence of the 7-day Redis key; extend assertions to also verify
last30dRepository.countByAnchorDate(ANCHOR) is zero and that the Redis key
"ranking:last30d:" + ANCHOR_KEY + ":control" does not exist via
redisTemplate.hasKey(...). Additionally, add an assertion that the audit logs
for this anchor reflect the empty-source policy (use your audit repository
method, e.g., auditRepository.findByAnchorDate(ANCHOR) or
jobAuditRepository.countByAnchorDate(ANCHOR)) to ensure no unexpected MV INSERT
records were created.
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java-55-93 (1)
55-93: ⚠️ Potential issue | 🟡 Minor

Tasklet 메서드에서 @Transactional 애너테이션을 제거하라.

Spring Batch의 TaskletStep은 이미 Step 설정에 바인딩된 PlatformTransactionManager로 execute() 호출을 단일 트랜잭션으로 감싼다. 메서드 레벨 @Transactional을 추가하면 (1) Batch TX 인터셉터와 (2) Spring TX AOP 프록시가 중첩되어 불필요한 복잡성을 증가시킨다. 프록시 순서에 따라 TX 속성 적용 순서가 의도와 달라질 수 있으며, 운영 관점에서 "어떤 TX 매니저가 이 단계의 원자 교체를 보장하는가"가 Step 설정 하나로 명확히 읽혀야 한다.
변경 사항
- `@Override`
- `@Transactional`
- public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
+ `@Override`
+ public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
추가 테스트: 격리 수준이 repeatable-read 또는 read-committed인 환경에서 "DELETE 직후 외부 세션이 여전히 기존 MV를 본다 → 커밋 후에만 새 MV를 본다"는 원자 교체 보장을 E2E 단위로 명시적으로 검증하는 테스트를 추가하는 것을 권장한다.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java`
around lines 55 - 93, Remove the method-level `@Transactional` annotation from
PromoteTopToMvTasklet.execute(...) so the step uses the Step-configured
PlatformTransactionManager only; locate the execute method in class
PromoteTopToMvTasklet, delete the `@Transactional` annotation (and any unused
import) and rely on the TaskletStep transaction wrapper instead of an additional
Spring TX AOP proxy.
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RankingJobParametersListener.java-66-82 (1)
66-82: ⚠️ Potential issue | 🟡 Minor

ExecutionContext 스냅샷에서 trafficPct와 active 필드가 누락되어 복원 시 설정값이 손실된다.

RankingJobParametersListener의 66-82줄에서 WeightConfig 저장 시 wView, wLike, wOrder만 기록하고 trafficPct와 active 필드는 저장하지 않는다. 그 결과 restoreWeightConfigs 메서드(91-101줄)는 복원 시 trafficPct를 항상 0으로 설정하며 active도 true로 강제한다.

trafficPct는 ExperimentGroupResolver에서 트래픽 분배 bucket을 계산할 때 실제로 사용되는 필드이므로, 배치 실행 후 재시작 시나리오에서 복원된 설정의 trafficPct=0은 트래픽 라우팅 로직 오류로 이어질 수 있다. 운영 관점에서 "ExecutionContext가 동결의 단일 진실원"이라는 원칙이 깨져 있으며, 향후 trafficPct를 그룹별로 다르게 운영하려는 순간 무음 장애가 발생한다.
🔧 수정안
 for (WeightConfig c : configs) {
 String prefix = CTX_WEIGHT_PREFIX + c.getGroupName() + ".";
 ctx.putDouble(prefix + "wView", c.getWView());
 ctx.putDouble(prefix + "wLike", c.getWLike());
 ctx.putDouble(prefix + "wOrder", c.getWOrder());
+ ctx.putInt(prefix + "trafficPct", c.getTrafficPct());
+ ctx.putBoolean(prefix + "active", c.isActive());
 }
그리고 restoreWeightConfigs 메서드에서 trafficPct와 active도 함께 복원한다:
 double wView = ctx.getDouble(prefix + "wView");
 double wLike = ctx.getDouble(prefix + "wLike");
 double wOrder = ctx.getDouble(prefix + "wOrder");
- result.add(new WeightConfig(groupName, wView, wLike, wOrder, 0, true));
+ int trafficPct = ctx.getInt(prefix + "trafficPct");
+ boolean active = ctx.getBoolean(prefix + "active");
+ result.add(new WeightConfig(groupName, wView, wLike, wOrder, trafficPct, active));
추가 테스트: RankingJobParametersListenerTest에 다음 시나리오를 추가한다.

"DB에 저장된 WeightConfig의 모든 필드가 동일한 값으로 복원된다"는 round-trip 단언

여러 그룹이 활성화될 때 각 그룹의 서로 다른 trafficPct가 정확히 복원되는지 검증
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RankingJobParametersListener.java`
around lines 66 - 82, The ExecutionContext snapshot currently writes only
wView/wLike/wOrder causing restoreWeightConfigs to lose trafficPct and active;
update the block in RankingJobParametersListener that writes the snapshot (using
CTX_WEIGHT_PREFIX, CTX_ACTIVE_WEIGHT_GROUPS and iterating WeightConfig) to also
putDouble(prefix + "trafficPct", c.getTrafficPct()) and putBoolean(prefix +
"active", c.isActive()), and then update restoreWeightConfigs to read those keys
(trafficPct and active) instead of defaulting trafficPct to 0 and forcing
active=true; add a round-trip unit test in RankingJobParametersListenerTest
asserting all WeightConfig fields (groupName, wView, wLike, wOrder, trafficPct,
active) are preserved for multiple groups.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7b7ec7ef-0a35-4c00-b1aa-e62c36545b02

📥 Commits

Reviewing files that changed from the base of the PR and between 98555d8 and 776b6a5.

⛔ Files ignored due to path filters (2)

week10/blog-final-final.md is excluded by !**/*.md and included by **
week10/blog-final.md is excluded by !**/*.md and included by **

📒 Files selected for processing (88)

apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingKeyResolver.java
apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/RankingPeriod.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankId.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7d.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvRankEntry.java
apps/commerce-api/src/main/java/com/loopers/domain/ranking/mv/MvRankingQueryRepository.java
apps/commerce-api/src/main/java/com/loopers/infrastructure/ranking/MvRankingQueryRepositoryImpl.java
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingKeyResolverTest.java
apps/commerce-api/src/test/java/com/loopers/application/ranking/RankingServiceMvFallbackTest.java
apps/commerce-api/src/test/java/com/loopers/interfaces/api/ranking/RankingApiE2ETest.java
apps/commerce-batch/build.gradle.kts
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/RollingRankingJobConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RankingJobParametersListener.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RollingWindow.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/param/RollingWindowResolver.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreFormula.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/ScoreProcessor.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/score/StagingScoredWriter.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/AggregatedMetric.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/LikeMetricStreamingReader.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/OrderMetricStreamingReader.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/RawMetricRow.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageLikeMetricsStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageOrderMetricsStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StageViewMetricsStepConfig.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingAggregationProcessor.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingLikeAggregationProcessor.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingLikeMetricsWriter.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingOrderAggregationProcessor.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingOrderMetricsWriter.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StagingViewMetricsWriter.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/StreamingMetricAggregator.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/ViewMetricStreamingReader.java
apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductLikeMetric.java
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductMetricId.java
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductOrderMetric.java
apps/commerce-batch/src/main/java/com/loopers/domain/metrics/ProductViewMetric.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/audit/BatchAuditLog.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/audit/BatchAuditLogRepository.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankId.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30d.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast30dRepository.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7d.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/mv/MvProductRankLast7dRepository.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregation.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregationId.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingAggregationRepository.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScored.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScoredId.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/staging/StagingRankingScoredRepository.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfig.java
apps/commerce-batch/src/main/java/com/loopers/domain/ranking/weight/WeightConfigRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/BatchAuditLogJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/BatchAuditLogRepositoryImpl.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast30dJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast30dRepositoryImpl.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast7dJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/MvProductRankLast7dRepositoryImpl.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingAggregationJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingAggregationRepositoryImpl.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingScoredJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/StagingRankingScoredRepositoryImpl.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/WeightConfigJpaRepository.java
apps/commerce-batch/src/main/java/com/loopers/infrastructure/ranking/WeightConfigRepositoryImpl.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobE2ETest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/RollingRankingJobRestartTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeeder.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeederIntegrationTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/SeedSpec.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/Tier.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/measurement/RollingRankingJobBenchmark.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RollingWindowResolverTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/score/ScoreAggregationStepIntegrationTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/score/ScoreFormulaTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StageMetricsPipelineIntegrationTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StageViewMetricsStepIntegrationTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/stage/StreamingMetricAggregatorTest.java
apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingStepIntegrationTest.java
scripts/measure-ranking-batch.sh

coderabbitai · 2026-04-17T08:21:41Z

+    private List<RankEntry> fallbackFromMv(RankingPeriod period, LocalDate date, int page, int size, String group) {
+        try {
+            LocalDate anchorDate = keyResolver.anchorDateOf(date);
+            int offset = page * size;
+
+            // 현재 anchor 의 MV 가 비어있으면 전일 anchor 로 자동 fallback (최대 3일).
+            // 배치 미실행 또는 해당 anchor 에 데이터가 없으면 비어있을 수 있음.
+            // "잘못된 랭킹" 보다 "어제 랭킹이라도 보여주기" 가 사용자 경험상 나음.
+            for (int retry = 0; retry < MV_FALLBACK_MAX_DAYS; retry++) {
+                List<MvRankEntry> rows = switch (period) {
+                    case LAST_7D  -> mvRankingQueryRepository.findLast7d(anchorDate, group, offset, size);
+                    case LAST_30D -> mvRankingQueryRepository.findLast30d(anchorDate, group, offset, size);
+                    default -> List.of();
+                };
+                if (!rows.isEmpty()) {
+                    if (retry > 0) {
+                        log.info("MV fallback: 현재 anchor 비어있어 전일로 대체. period={}, 원래anchor={}, 사용anchor={}",
+                                period, keyResolver.anchorDateOf(date), anchorDate);
+                    }
+                    return rows.stream()
+                            .map(r -> new RankEntry(r.productId(), r.score(), r.rankPosition()))
+                            .toList();
+                }
+                anchorDate = anchorDate.minusDays(1);
+            }
+            return List.of();
+        } catch (Exception e) {
+            log.error("MV fallback 실패. period={}, date={}, group={}", period, date, group, e);
+            return List.of();
+        }
    }


⚠️ Potential issue | 🟠 Major

MV 조회에 타임아웃/실패 처리 정책이 없어 cascading 장애로 번질 수 있다.

fallbackFromMv 는 최대 3 회 DB 조회를 수행한다. 각 호출의 JDBC 타임아웃 / 커넥션 풀 상한 / 지연 허용치에 대한 방어가 없고, MV 테이블이 락이나 대형 트랜잭션에 물리면 3 회가 누적되어 API 스레드가 오래 점유된다. Redis 장애가 났을 때 모든 요청이 이 경로로 몰리면 MV 조회 큐잉으로 DB 커넥션 풀이 바닥나는 연쇄 장애가 발생한다.

또한 catch (Exception e) 로 모든 예외를 List.of() 로 바꿔 삼키는 구조 (120-123) 는 다음을 의미한다:

사용자에게는 "빈 랭킹" 으로 보이는데, 로그만으로는 DB 장애 인지 데이터 없음 인지 구분이 어렵다.

APM / 알림 연동 (예: Sentry, CoreException → ApiControllerAdvice) 의 에러율 지표에 잡히지 않는다.

권고다:

MvRankingQueryRepository 구현체에 statement timeout / @Transactional(timeout = N) 설정 (예: 1 초).

첫 번째 조회 실패는 fallback 으로 살리더라도, 연속 실패 시에는 지표성 예외 (CoreException) 또는 Micrometer 카운터 + 구조화 로그 (mv_fallback_error tag) 를 방출해 운영 가시성 확보.

Redis 장애 장기화 시 circuit breaker (resilience4j) 로 MV 호출 자체를 차단하고 그대로 빈 리스트 반환 + 알림.

추가 테스트로는 MvRankingQueryRepository Mockito stub 으로 SQLTimeoutException 을 던졌을 때:

호출 횟수가 3 회로 제한되는지,

로그 레벨이 error 인지,

반환값이 List.of() 인지

확인하는 케이스를 포함하길 권고한다.

코딩 가이드라인 (외부 호출에는 타임아웃/재시도/서킷브레이커 고려 여부를 점검하고, 실패 시 대체 흐름을 제안한다) 에 따라 제안한다.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-api/src/main/java/com/loopers/application/ranking/RankingService.java` around lines 94 - 124, fallbackFromMv performs up to MV_FALLBACK_MAX_DAYS DB calls without per-call timeouts and swallows all exceptions, enabling cascading failures and losing operational visibility; update MvRankingQueryRepository implementations to enforce statement/transaction timeouts (e.g., `@Transactional`(timeout=1) or JDBC statement timeout) and wrap the MV access in a circuit breaker (resilience4j) around the calls from fallbackFromMv; change fallbackFromMv to treat the first SQL timeout/exception as a single fallback (log as warn and continue), but on consecutive failures emit a CoreException or increment a Micrometer counter + structured error log (tag mv_fallback_error) so failures surface to APM, and ensure the method still returns List.of() when the breaker is open; add unit tests for fallbackFromMv using Mockito to throw SQLTimeoutException from MvRankingQueryRepository verifying at-most MV_FALLBACK_MAX_DAYS invocations, error-level logging/metric emission on repeated failures, and that the method returns an empty list when the breaker tripped.

coderabbitai · 2026-04-17T08:21:41Z

+tasks.register<Test>("benchmarkTest") {
+    description = "랭킹 배치 선형성/스파이크 측정 (오래 걸림)"
+    group = "verification"
+    useJUnitPlatform {
+        includeTags("benchmark")
+    }
+    testClassesDirs = sourceSets["test"].output.classesDirs
+    classpath = sourceSets["test"].runtimeClasspath
+    // 측정 결과가 stdout 으로 흘러나오도록 standard output 강제 노출
+    testLogging {
+        showStandardStreams = true
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # benchmark 테스트가 build/benchmark-results.txt에 실제로 기록하는 경로가 어디서 결정되는지, # 그리고 스크립트가 참조하는 경로와 일치하는지 교차 확인한다. fd -t f 'RollingRankingJobBenchmark.java' | xargs -I{} sed -n '1,200p' {} fd -t f 'measure-ranking-batch.sh' | xargs -I{} cat {}

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 8266

벤치마크 태스크에 JVM 리소스·타임아웃 설정을 명시하는 편이 운영 관점에서 안전하다.

PR 요약상 XL 시나리오가 최대 100,000 상품(1,000,000행 이상)이다. 현재 구성은 기본 JVM 힙과 포크 설정을 사용하므로, CI 러너나 다른 개발자 환경에서:

힙 부족으로 OutOfMemoryError 발생 시 원인 파악이 어렵다(벤치마크 회귀인지, 환경 문제인지 구분 불가).

회귀로 작업이 무한 대기에 빠지면 CI 슬롯을 점유한다(타임아웃 부재).

벤치마크는 결정적으로 실행되어야 하므로 병렬 포크는 비활성화해야 한다.

♻️ 제안

tasks.register<Test>("benchmarkTest") { description = "랭킹 배치 선형성/스파이크 측정 (오래 걸림)" group = "verification" useJUnitPlatform { includeTags("benchmark") } testClassesDirs = sourceSets["test"].output.classesDirs classpath = sourceSets["test"].runtimeClasspath + // 벤치마크는 단일 포크에서 결정적으로 실행한다. + maxParallelForks = 1 + forkEvery = 0 + // 대용량(100K products ~1M rows) 시나리오에 대비한 최소 힙. + minHeapSize = "1g" + maxHeapSize = "2g" + // 회귀 시 CI 슬롯 점유 방지. + timeout.set(java.time.Duration.ofMinutes(30)) // 측정 결과가 stdout 으로 흘러나오도록 standard output 강제 노출 testLogging { showStandardStreams = true }

추가 검증: ./gradlew :apps:commerce-batch:benchmarkTest를 CI 환경에서 실행하여 heap 메모리 부족 없이 완료되는지, 그리고 apps/commerce-batch/build/benchmark-results.txt가 성공적으로 생성되는지 확인한다.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

tasks.register<Test>("benchmarkTest") {

description = "랭킹 배치 선형성/스파이크 측정 (오래 걸림)"

group = "verification"

useJUnitPlatform {

includeTags("benchmark")

}

testClassesDirs = sourceSets["test"].output.classesDirs

classpath = sourceSets["test"].runtimeClasspath

// 측정 결과가 stdout 으로 흘러나오도록 standard output 강제 노출

testLogging {

showStandardStreams = true

}

tasks.register<Test>("benchmarkTest") {

description = "랭킹 배치 선형성/스파이크 측정 (오래 걸림)"

group = "verification"

useJUnitPlatform {

includeTags("benchmark")

}

testClassesDirs = sourceSets["test"].output.classesDirs

classpath = sourceSets["test"].runtimeClasspath

// 벤치마크는 단일 포크에서 결정적으로 실행한다.

maxParallelForks = 1

forkEvery = 0

// 대용량(100K products ~1M rows) 시나리오에 대비한 최소 힙.

minHeapSize = "1g"

maxHeapSize = "2g"

// 회귀 시 CI 슬롯 점유 방지.

timeout.set(java.time.Duration.ofMinutes(30))

// 측정 결과가 stdout 으로 흘러나오도록 standard output 강제 노출

testLogging {

showStandardStreams = true

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/build.gradle.kts` around lines 31 - 42, The benchmarkTest task currently uses default JVM and fork settings; update the tasks.register<Test>("benchmarkTest") configuration to explicitly set forkEvery = 1 or maxParallelForks = 1 (disable parallel forks), configure minHeapSize/maxHeapSize or jvmArgs like "-Xms512m", "-Xmx4g" to ensure sufficient heap, and add a test timeout via test.timeout or setTestTimeout (e.g., a reasonable minutes value) so CI jobs cannot hang; also ensure showStandardStreams remains true so stdout is captured. Use the benchmarkTest task name and its Test configuration block (useJUnitPlatform, testClassesDirs, classpath, testLogging) to locate where to add these JVM/fork/timeout settings.

coderabbitai · 2026-04-17T08:21:41Z

+    private int refreshZSet(String key, String sql, Date anchorDate, String weightGroup) {
+        List<ProductScore> rows = jdbcTemplate.query(sql,
+                (rs, rn) -> new ProductScore(rs.getLong("product_id"), rs.getDouble("score")),
+                anchorDate, weightGroup);
+
+        if (rows.isEmpty()) {
+            redisTemplate.delete(key);
+            return 0;
+        }
+
+        String shadowKey = key + ":rebuild";
+        redisTemplate.delete(shadowKey);
+
+        Set<ZSetOperations.TypedTuple<String>> tuples = new HashSet<>(rows.size());
+        for (ProductScore row : rows) {
+            tuples.add(ZSetOperations.TypedTuple.of(String.valueOf(row.productId()), row.score()));
+        }
+        redisTemplate.opsForZSet().add(shadowKey, tuples);
+        redisTemplate.rename(shadowKey, key);
+        redisTemplate.expire(key, TTL);
+
+        return rows.size();
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Does Redis RENAME preserve the TTL of the source key on the destination key?

💡 Result:

Yes, the Redis RENAME command preserves the TTL of the source key on the destination key. When RENAME is executed, the entire key including its value, TTL (if any), and other attributes is moved to the new name. If the destination key already exists, it is overwritten (equivalent to an implicit DEL), and the source key's TTL takes over.

Citations:

1: https://redis.io/docs/latest/commands/expire/

2: https://redis.io/commands/expire

3: https://redis.io/docs/latest/commands/expire

4: https://redis.antirez.com/commands/content/commands/expire.md

5: TTL not preserved when key is renamed redis/redis#128

6: https://oneuptime.com/blog/post/2026-03-31-redis-how-to-use-rename-and-renamenx-in-redis-to-rename-keys/view

7: https://oneuptime.com/blog/post/2026-03-31-redis-rename-renamenx/view

Shadow 키의 TTL 누락과 빈 결과 경로에서의 스토리지 누수 문제를 해결해야 한다.

세 가지 운영상 문제점이 있다.

Shadow 키의 TTL 미설정으로 인한 고아 키 누수: redisTemplate.opsForZSet().add(shadowKey, tuples) 직후 RENAME 이전에 배치가 장애로 중단되면, shadow 키가 TTL 없이 무기한 남는다. 이 배치가 "실패해도 치명적이지 않음"으로 설계되었기에, 다음 실행이 다른 anchorDate로 진행되는 경로에서는 이 고아 키가 주기적으로 누적된다.

RENAME 후 TTL 설정 간의 비원자성: Redis RENAME은 원본 키의 TTL 상태를 대상 키에 승계한다. shadow 키가 TTL을 갖지 않으므로, RENAME 직후 key도 TTL이 없는 상태다. redisTemplate.rename(shadowKey, key) 후 redisTemplate.expire(key, TTL) 호출 사이에 마이크로초 단위의 gap이 있으며, 이 구간에서 다른 조회 경로가 key를 접근하거나 다른 배치와의 경합이 발생하면 TTL 설정이 유실될 수 있다. 결과적으로 MV와 동기되지 않은 ZSET이 무기한 잔존할 위험이 있다.

빈 결과 경로에서의 shadow 키 미정리: if (rows.isEmpty()) 분기에서 redisTemplate.delete(key)만 수행하고 shadow 키를 정리하지 않는다. 이전 실행 중단 시 남은 shadow 키가 방치되어 운영 과정에 점진적으로 누적된다.

수정안:

Shadow 키에 ZADD 직후 즉시 TTL을 부여한다.

빈 결과 경로에서도 shadow 키를 함께 정리한다.

선택적으로, RENAME과 최종 TTL 보정을 Lua 스크립트 또는 SessionCallback으로 묶어 원자 실행하면 더욱 안전하다.

수정안 코드

String shadowKey = key + ":rebuild"; redisTemplate.delete(shadowKey); Set<ZSetOperations.TypedTuple<String>> tuples = new HashSet<>(rows.size()); for (ProductScore row : rows) { tuples.add(ZSetOperations.TypedTuple.of(String.valueOf(row.productId()), row.score())); } redisTemplate.opsForZSet().add(shadowKey, tuples); + redisTemplate.expire(shadowKey, TTL); redisTemplate.rename(shadowKey, key); redisTemplate.expire(key, TTL); return rows.size(); } private void handleEmptyResult(String key) { String shadowKey = key + ":rebuild"; redisTemplate.delete(key); + redisTemplate.delete(shadowKey); return 0; }

추가 권장 테스트:

"RENAME 직전 배치 중단 시 shadow 키가 TTL 내에 자동 소멸되는가"를 검증하는 통합 테스트

빈 결과 경로 실행 전후 shadow 키의 존재 여부를 확인하는 단위 테스트

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

private int refreshZSet(String key, String sql, Date anchorDate, String weightGroup) {

List<ProductScore> rows = jdbcTemplate.query(sql,

(rs, rn) -> new ProductScore(rs.getLong("product_id"), rs.getDouble("score")),

anchorDate, weightGroup);

if (rows.isEmpty()) {

redisTemplate.delete(key);

return 0;

}

String shadowKey = key + ":rebuild";

redisTemplate.delete(shadowKey);

Set<ZSetOperations.TypedTuple<String>> tuples = new HashSet<>(rows.size());

for (ProductScore row : rows) {

tuples.add(ZSetOperations.TypedTuple.of(String.valueOf(row.productId()), row.score()));

}

redisTemplate.opsForZSet().add(shadowKey, tuples);

redisTemplate.rename(shadowKey, key);

redisTemplate.expire(key, TTL);

return rows.size();

}

private int refreshZSet(String key, String sql, Date anchorDate, String weightGroup) {

List<ProductScore> rows = jdbcTemplate.query(sql,

(rs, rn) -> new ProductScore(rs.getLong("product_id"), rs.getDouble("score")),

anchorDate, weightGroup);

if (rows.isEmpty()) {

String shadowKey = key + ":rebuild";

redisTemplate.delete(key);

redisTemplate.delete(shadowKey);

return 0;

}

String shadowKey = key + ":rebuild";

redisTemplate.delete(shadowKey);

Set<ZSetOperations.TypedTuple<String>> tuples = new HashSet<>(rows.size());

for (ProductScore row : rows) {

tuples.add(ZSetOperations.TypedTuple.of(String.valueOf(row.productId()), row.score()));

}

redisTemplate.opsForZSet().add(shadowKey, tuples);

redisTemplate.expire(shadowKey, TTL);

redisTemplate.rename(shadowKey, key);

redisTemplate.expire(key, TTL);

return rows.size();

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java` around lines 86 - 108, The refreshZSet method can leave orphaned shadow keys and lose TTL because shadowKey (key + ":rebuild") is created without TTL, rows.isEmpty() only deletes key not shadowKey, and rename() copies TTL state so a non-TTL shadow makes the final key persist; fix by ensuring shadowKey is deleted in the empty-rows branch, apply a TTL to shadowKey immediately after redisTemplate.opsForZSet().add(shadowKey, tuples), and perform the rename + TTL setting atomically (wrap rename and expire in a Redis Lua script or a RedisTemplate SessionCallback) so that rename and final TTL assignment cannot be lost due to process interruption or race.

coderabbitai · 2026-04-17T08:21:42Z

+    public ViewMetricStreamingReader(
+            DataSource dataSource,
+            @Value("#{jobExecutionContext['" + RankingJobParametersListener.CTX_LAST_7D_START + "']}") String last7dStart,
+            @Value("#{jobExecutionContext['" + RankingJobParametersListener.CTX_LAST_30D_START + "']}") String last30dStart,
+            @Value("#{jobExecutionContext['" + RankingJobParametersListener.CTX_LAST_30D_END + "']}") String last30dEnd
+    ) {
+        this.last7dStart = LocalDateTime.parse(last7dStart);
+        LocalDateTime last30dStartTime = LocalDateTime.parse(last30dStart);
+        LocalDateTime last30dEndTime = LocalDateTime.parse(last30dEnd);
+
+        this.delegate = new JdbcCursorItemReaderBuilder<RawMetricRow>()
+                .name("viewMetricCursorReader")
+                .dataSource(dataSource)
+                .fetchSize(FETCH_SIZE)
+                .sql("""
+                        SELECT product_id, bucket_time, view_count
+                          FROM product_view_metrics
+                         WHERE bucket_time >= ?
+                           AND bucket_time <  ?
+                         ORDER BY product_id, bucket_time
+                        """)
+                .preparedStatementSetter((ps) -> {
+                    ps.setTimestamp(1, Timestamp.valueOf(last30dStartTime));
+                    ps.setTimestamp(2, Timestamp.valueOf(last30dEndTime));
+                })
+                .rowMapper((rs, rowNum) -> new RawMetricRow(
+                        rs.getLong("product_id"),
+                        rs.getTimestamp("bucket_time").toLocalDateTime(),
+                        rs.getLong("view_count")
+                ))
+                .build();
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # 1) JDBC URL 에 useCursorFetch 가 걸려 있는지 확인 rg -nP -C2 'jdbc:mysql' --type=yml --type=yaml --type=properties rg -nP -C2 'useCursorFetch' # 2) fetchSize 를 쓰는 다른 reader 도 동일 리스크인지 확인 rg -nP -C3 'fetchSize\s*\(' --type=java # 3) 혹시 Integer.MIN_VALUE 로 row-by-row 스트리밍을 구성했는지 rg -nP 'Integer\.MIN_VALUE' --type=java

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 4611

🏁 Script executed:

# Find configuration files find . -type f \( -name "application*.yml" -o -name "application*.yaml" -o -name "application*.properties" \) | head -20

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 426

🏁 Script executed:

# Search for datasource/database configuration rg -nA5 "spring\.datasource|DataSource|jdbc:mysql" -t java -t properties | head -100

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 119

🏁 Script executed:

# Check for useCursorFetch in all file types rg -n "useCursorFetch" .

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

🏁 Script executed:

# Check batch application.yml for datasource/JDBC configuration cat ./apps/commerce-batch/src/main/resources/application.yml | head -100

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 783

🏁 Script executed:

# Check if there's an application-batch.yml find ./apps/commerce-batch -name "application*.yml" -o -name "application*.yaml"

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 141

🏁 Script executed:

# Look for datasource configuration in batch module rg -n "datasource|url|mysql" ./apps/commerce-batch/src/main/resources/ -A3

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

🏁 Script executed:

# Find jpa.yml files find ./apps/commerce-batch -name "jpa.yml" -o -name "jpa.yaml"

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

🏁 Script executed:

# Also check for shared configuration files find . -path "*/src/main/resources/*" -name "jpa.yml" -o -name "jpa.yaml" | grep -v node_modules

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 125

🏁 Script executed:

# Search more broadly for datasource configuration rg -rn "spring\.datasource|datasource\.url|jdbc:" ./apps/commerce-batch/src/

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 210

🏁 Script executed:

# Read the jpa.yml file to check JDBC URL configuration cat ./modules/jpa/src/main/resources/jpa.yml

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 2231

🏁 Script executed:

# Verify FETCH_SIZE constant value rg -n "FETCH_SIZE" ./apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/ -B2 -A2

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 4926

🏁 Script executed:

# Confirm no workarounds with Integer.MIN_VALUE exist rg -n "setFetchSize\|Integer\.MIN_VALUE" ./apps/commerce-batch/src/main/java/

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

🏁 Script executed:

# Check if there's any cursor streaming configuration elsewhere (e.g., DataSourceConfig) rg -n "useCursorFetch\|cursorFetch\|row-by-row" ./apps/commerce-batch/src/main/java/ -i

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

MySQL 커서 스트리밍 활성화를 위해 JDBC URL에 useCursorFetch=true 파라미터를 추가하고 테스트해야 한다.

현재 jpa.yml의 모든 프로파일(local, test, dev, qa, prd)에 jdbc:mysql://... 형태의 URL이 설정되어 있으나 useCursorFetch 파라미터가 없다. 이 경우 JdbcCursorItemReaderBuilder.fetchSize(2000) 설정만으로는 MySQL Connector/J가 결과 전체를 클라이언트 힙에 로드하며, 스트리밍 모드를 활성화하려면 JDBC URL에 useCursorFetch=true를 명시해야 한다. 설계 문서상 XL(1,048,236 row) 벤치마크를 통과했다 하더라도 로컬 테스트 환경의 메모리 여유에 가려진 것일 수 있으며, 운영 DB에서 product_view_metrics, product_like_metrics, product_order_metrics 테이블이 증가하면 커서 스캔이 아닌 풀 로드로 인해 배치 JVM OOM이 발생할 수 있다.

ViewMetricStreamingReader(line 56), LikeMetricStreamingReader(line 47), OrderMetricStreamingReader(line 47), ScoreAggregationStepConfig(line 45) 모두 동일한 패턴으로 fetchSize만 설정되어 있다. jpa.yml의 datasource.mysql-jpa.main.jdbc-url에 useCursorFetch=true를 추가하고, 추가 테스트에서 JMX 또는 RuntimeMXBean으로 힙 사용량이 선형 증가하지 않고 상한이 유지되는지 검증하는 케이스를 추가한다.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/stage/ViewMetricStreamingReader.java` around lines 43 - 74, Add the MySQL Connector/J streaming flag to the datasource URLs and validate heap behavior: update the jdbc:mysql://... values used by the application profiles so the JDBC URL includes useCursorFetch=true (e.g., append ?useCursorFetch=true or &useCursorFetch=true as appropriate) so JdbcCursorItemReader.fetchSize(...) actually enables server-side cursoring; apply this change for all profiles used in jpa.yml and any test datasource configs referenced by ScoreAggregationStepConfig, ViewMetricStreamingReader, LikeMetricStreamingReader and OrderMetricStreamingReader. After updating the URLs, add/execute an integration test that runs the streaming readers (ViewMetricStreamingReader/LikeMetricStreamingReader/OrderMetricStreamingReader) against a large test dataset and assert via JMX or RuntimeMXBean that heap usage does not grow linearly (stays bounded) during iteration to confirm cursor-based streaming is active.

coderabbitai · 2026-04-17T08:21:42Z

+    @Override
+    @Transactional
+    public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
+        int aggregationDeleted = aggregationRepository.deleteByPeriodKey(anchorDateKey);
+        int scoredDeleted = scoredRepository.deleteByPeriodKey(anchorDateKey);
+
+        log.info(
+                "[STEP=truncateStagingStep] anchorDateKey={} aggregationDeleted={} scoredDeleted={}",
+                anchorDateKey, aggregationDeleted, scoredDeleted
+        );
+
+        contribution.incrementWriteCount(aggregationDeleted + scoredDeleted);
+        return RepeatStatus.FINISHED;
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # TruncateStagingTasklet 이 어떤 Step 설정에서 등록되는지, transactionManager 가 함께 주입되는지 확인 fd -t f 'TruncateStagingStepConfig.java' | xargs -I{} cat {} rg -nP -C3 'tasklet\s*\(\s*\w*[Tt]runcate'

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 1046

🏁 Script executed:

# 1. TruncateStagingTasklet 파일 전체 내용 확인 cat -n apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java # 2. RollingRankingJobConfig.java 전체 Step 설정 확인 cat -n apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/RollingRankingJobConfig.java | sed -n '60,80p' # 3. 관련 테스트 클래스 존재 여부 확인 fd -t f 'TruncateStagingStepIntegrationTest.java' # 4. 다른 Tasklet 클래스들이 `@Transactional을` 사용하는지 확인 fd -t f -path '*/batch/job/*/step/*Tasklet.java' -exec grep -l '@Transactional' {} \;

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 3254

🏁 Script executed:

# 1. 테스트 클래스 내용 확인 cat -n apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingStepIntegrationTest.java # 2. PromoteTopToMvStepConfig 찾기 fd -type f 'PromoteTopToMvStepConfig.java' | xargs cat -n # 3. 배치 모듈의 다른 Tasklet들 찾기 fd -type f -path '*/batch/job/*/step/*Tasklet.java' | head -10 # 4. 다른 Tasklet들의 `@Transactional` 사용 패턴 확인 fd -type f -path '*/batch/job/*/step/*Tasklet.java' | xargs grep -l '@Transactional'

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 7470

🏁 Script executed:

# 1. 다른 Tasklet 파일들 찾기 (fd 명령어 수정) find apps/commerce-batch -name '*Tasklet.java' -type f # 2. 다른 Tasklet들이 `@Transactional을` 사용하는지 확인 find apps/commerce-batch -name '*Tasklet.java' -type f | xargs grep -l '@Transactional' 2>/dev/null # 3. PromoteTopToMvStepConfig 찾기 find . -name 'PromoteTopToMvStepConfig.java' -type f | xargs cat -n 2>/dev/null # 4. RollingRankingJobConfig에서 transactionManager 주입 확인 grep -n "transactionManager" apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/RollingRankingJobConfig.java

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 2185

🏁 Script executed:

# 1. 각 Tasklet 파일들의 `@Transactional` 사용 여부 확인 echo "=== PromoteTopToMvTasklet.java ===" && \ cat apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java | head -50 echo -e "\n=== RedisRefreshTasklet.java ===" && \ cat apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java | head -50 echo -e "\n=== DemoTasklet.java ===" && \ cat apps/commerce-batch/src/main/java/com/loopers/batch/job/demo/step/DemoTasklet.java | head -50

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 5429

🏁 Script executed:

# 1. PromoteTopToMvTasklet의 execute 메서드 및 `@Transactional` 확인 grep -n "@Transactional\|public RepeatStatus execute" apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/promote/PromoteTopToMvTasklet.java # 2. RedisRefreshTasklet 전체 파일 확인 cat -n apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/redis/RedisRefreshTasklet.java # 3. RollingRankingJobConfig에서 truncateStagingStep 설정 전체 확인 sed -n '65,75p' apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/RollingRankingJobConfig.java # 4. TruncateStagingTasklet의 execute 메서드 전체 확인 (14줄부터 끝까지) sed -n '30,47p' apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 6339

Tasklet 의 @Transactional 선언은 일관성 검토와 테스트 강화가 필요하다.

현재 TruncateStagingTasklet 과 PromoteTopToMvTasklet 이 모두 @Transactional 을 선언하는 반면, RedisRefreshTasklet 과 DemoTasklet 은 선언하지 않는 상황이다. 이는 배치 계층의 트랜잭션 관리 책임이 명확하지 않음을 시사한다.

Spring Batch 의 설계상 관점: StepBuilder.tasklet(tasklet, transactionManager) 는 이미 TransactionTemplate 으로 execute() 를 감싸므로, 동일한 PlatformTransactionManager 사용 시 @Transactional 은 기존 트랜잭션에 참여(PROPAGATION_REQUIRED)하여 사실상 no-op 이다. 다만 AOP 프록시 레이어가 중첩되어 자기 호출이나 다른 트랜잭션 매니저 선택 시 의도하지 않은 경계가 잡힐 가능성이 있다.

문제점:

TruncateStagingTasklet 의 DELETE 가 Step 수준 롤백과 동기화되는지 보장하는 테스트가 없다

PromoteTopToMvTasklet 도 @Transactional 을 선언하고 있으므로 제거 권고는 먼저 전체 Tasklet 계층의 일관성 정책을 수립한 후 적용해야 한다

수정안:

배치 계층의 트랜잭션 책임 정책을 정의한다 (Step 수준만 관리 vs 선택적 선언)

정책에 따라 TruncateStagingTasklet 과 PromoteTopToMvTasklet 을 일관되게 정리한다

정책이 Step 수준 관리로 결정된다면 @Transactional 을 제거하고 import org.springframework.transaction.annotation.Transactional 도 제거한다

추가 테스트:
TruncateStagingStepIntegrationTest 에 다음 시나리오를 추가한다: Job 의 후속 Step 에서 강제 예외를 주입해 Step 수준 트랜잭션 롤백 시 TruncateStagingStep 의 DELETE 도 함께 롤백되는지 검증한다. 이를 통해 트랜잭션 경계가 Step 수준에 정확히 있음을 회귀 보장한다.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/src/main/java/com/loopers/batch/job/ranking/step/truncate/TruncateStagingTasklet.java` around lines 33 - 46, The Tasklet classes (TruncateStagingTasklet, PromoteTopToMvTasklet vs RedisRefreshTasklet, DemoTasklet) currently have inconsistent `@Transactional` usage; decide and apply a single policy (e.g., Step-level transaction management only), then make changes accordingly: if you choose Step-level management remove `@Transactional` and the import from TruncateStagingTasklet and PromoteTopToMvTasklet so they rely on StepBuilder.transactionManager(), or if you choose method-level transactions document and apply `@Transactional` consistently across all Tasklets; finally add an integration test TruncateStagingStepIntegrationTest that injects an exception in a downstream step to assert that deletes performed by TruncateStagingTasklet are rolled back together with the step to verify the chosen transaction boundary.

coderabbitai · 2026-04-17T08:21:42Z

+            jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
+                @Override public void setValues(PreparedStatement ps, int i) throws SQLException {
+                    long[] r = chunk.get(i);
+                    ps.setLong(1, r[0]);
+                    ps.setTimestamp(2, Timestamp.valueOf(LocalDateTime.ofEpochSecond(r[1], 0, java.time.ZoneOffset.UTC)));
+                    ps.setInt(3, (int) r[2]);
+                    ps.setLong(4, r[2]);
+                    ps.setLong(5, r[2] * 10_000L);
+                }
+                @Override public int getBatchSize() { return chunk.size(); }
+            });


⚠️ Potential issue | 🟠 Major

Timestamp.valueOf(LocalDateTime.ofEpochSecond(..., UTC)) 은 UTC 와 JVM TZ 를 섞는다.

line 122 / line 143 공통 문제다. epoch 초를 UTC LocalDateTime 으로 만든 직후 Timestamp.valueOf(...) 로 넘기면, Timestamp.valueOf 는 JVM 기본 TZ 기준으로 해석하므로 JVM TZ 가 UTC 가 아닌 이상 결과 Timestamp 의 절대 시각이 수 시간 어긋난다. CI 가 대부분 UTC 라 숨겨져 왔을 수 있지만, 로컬 개발자 PC (KST) 또는 TZ 가 다른 벤치마크 환경에서 돌리면 seed 된 bucket_time 이 anchor 윈도우 경계 밖으로 빠져 테스트가 플래키해지거나 벤치마크 결과가 왜곡된다.

수정안으로는 JDBC 드라이버에 UTC 기준으로 전달하는 경로로 일원화를 권고한다:

♻️ 제안 예시

- ps.setTimestamp(2, Timestamp.valueOf(LocalDateTime.ofEpochSecond(r[1], 0, java.time.ZoneOffset.UTC))); + ps.setTimestamp( + 2, + Timestamp.from(java.time.Instant.ofEpochSecond(r[1])), + java.util.Calendar.getInstance(java.util.TimeZone.getTimeZone("UTC")));

혹은 OffsetDateTime 기반 ps.setObject(2, odt) 를 쓰면 드라이버가 TZ 를 명확히 처리한다. 프로덕션 OrderMetricStreamingReader 등도 같은 패턴이면 공통 유틸로 뽑는 것이 바람직하다.

추가 테스트로는 -Duser.timezone=Asia/Seoul 프로필로 BaselineSeederIntegrationTest 를 1 회 추가 실행하는 CI 매트릭스 구성을 권고한다.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/fixture/BaselineSeeder.java` around lines 118 - 128, The code in BaselineSeeder uses Timestamp.valueOf(LocalDateTime.ofEpochSecond(..., UTC)) which mixes UTC instant with JVM default zone and causes wrong absolute times; change the batch setter to produce a timezone-safe Timestamp by converting the epoch seconds to an Instant and then to a Timestamp (e.g., Timestamp.from(Instant.ofEpochSecond(...))) or use an OffsetDateTime and call ps.setObject(2, offsetDateTime) so the JDBC driver handles TZ correctly; apply the same fix to the other occurrence referenced in the comment (line ~143) and consider extracting this conversion into a shared utility used by BaselineSeeder and OrderMetricStreamingReader.

coderabbitai · 2026-04-17T08:21:42Z

+    private final WeightConfigRepository stubRepo = new WeightConfigRepository() {
+        @Override public WeightConfig save(WeightConfig entity) { return entity; }
+        @Override public List<WeightConfig> findAllByActiveTrue() {
+            return List.of(new WeightConfig("control", 0.1, 0.2, 0.7, 100, true));
+        }
+    };


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash fd -t f 'WeightConfigRepository.java' | xargs -I{} sh -c 'echo "=== {} ==="; cat "{}"' fd -t f 'RankingJobParametersListener.java' -E '*Test*' | xargs -I{} cat {}

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 5395

🏁 Script executed:

wc -l apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 195

🏁 Script executed:

cat -n apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 4833

🏁 Script executed:

rg -t java "restoreWeightConfigs|CTX_ACTIVE_WEIGHT_GROUPS" apps/commerce-batch/src/test/ -A 2 -B 2

Repository: Loopers-dev-lab/loop-pack-be-l2-vol3-java

Length of output: 67

weight_group 스냅샷 검증 테스트를 추가한다

현재 테스트는 anchorDate 경계값만 검증하고, 핵심 동작인 weight_group 스냅샷 저장 및 복원을 전혀 검증하지 않는다. RankingJobParametersListener는 beforeJob()에서 findAllByActiveTrue() 결과를 ExecutionContext에 CTX_ACTIVE_WEIGHT_GROUPS와 "w.{groupName}.wView/wLike/wOrder" 형태로 저장하며, 설계.md에서 명시한 재시작 멱등성의 핵심이다.

재시작 시 DB의 weight_group이 변경되어도 ExecutionContext의 스냅샷은 그대로 유지되어야 하므로, 다음 검증을 추가한다:

beforeJob() 후 ExecutionContext에 CTX_ACTIVE_WEIGHT_GROUPS 키가 존재하고 정확한 그룹 이름을 포함함

"w.control.wView", "w.control.wLike", "w.control.wOrder" 등 개별 가중치 키가 저장됨

재시작 시나리오에서 DB의 weight_group이 변경되어도 ExecutionContext 스냅샷은 덮어쓰지 않음

restoreWeightConfigs()로 복원한 결과가 저장된 스냅샷과 일치함

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/commerce-batch/src/test/java/com/loopers/batch/job/ranking/param/RankingJobParametersListenerTest.java` around lines 22 - 27, Add assertions and a restart-scenario to verify weight_group snapshot behavior: after invoking RankingJobParametersListener.beforeJob() using the test stubRepo/WeightConfig, assert ExecutionContext contains CTX_ACTIVE_WEIGHT_GROUPS and includes the "control" group name, and assert individual keys "w.control.wView", "w.control.wLike", "w.control.wOrder" exist with the expected numeric values; then simulate restart by changing the repository to return different WeightConfig values and call beforeJob() again asserting the ExecutionContext snapshot was not overwritten; finally call RankingJobParametersListener.restoreWeightConfigs() and assert the restored WeightConfig objects match the original stored snapshot.

ghtjr410 and others added 21 commits April 17, 2026 17:04

Copilot AI review requested due to automatic review settings April 17, 2026 08:07

Copilot started reviewing on behalf of ghtjr410 April 17, 2026 08:08 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

ghtjr410 changed the title ~~[10주차] Batch 활용 주간 월간 랭킹 시스템~~ [10주차] Batch 활용 주간 월간 랭킹 시스템 - 최호석 Apr 17, 2026

Conversation

ghtjr410 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. 왜 (Why)

2. 설계 철학

경계 3층위

5대 속성 - 배치를 올바르게 사용하기 위한 판단 프레임워크

트리거 시간 ≠ 데이터 경계

배치 멱등성은 API보다 강한 조건

Immutable Input - restart가 유효한 전제

아토믹 스왑 - 배치가 필연적인 시나리오

3. 파이프라인 전체 그림

데이터 Tier 계층

Step 구조

배치 시퀀스 ① 선택 경계 - 입력 동결 + 초기화

배치 시퀀스 ② 처리 경계 - 메트릭 집계 + 점수 계산

배치 시퀀스 ③ 완료 경계 - MV 교체 + 이력 기록 + 캐시 전파

조회 경로 - LAST_7D/LAST_30D (Round 10 변경)

4. 핵심 설계 결정

4-1. App streaming 집계 - DB GROUP BY 회피

4-3. MV DELETE+INSERT 단일 TX 원자 교체

4-4. 외부 입력 전량 동결 - anchorDate + weightConfig snapshot

4-5. JdbcTemplate - JPA Writer를 쓰지 않는 이유

5. 설계 진화 - 무엇을 고치며 배웠는가

진화 1: saveState(false) → lookahead ItemStream 직렬화

진화 2: anchorDate만 동결 → 외부 입력 전량 스냅샷

진화 3: purge/promote 분리 → 단일 TX 원자 교체

6. 가드레일 - MV 원자 교체 + 전일 Fallback

API 전일 fallback

Step별 실패 복구 비용

7. 측정 결과

원시 데이터

구간별 분석

확인한 것과 미확인된 것

후속 측정 제안

8. 테스트 전략

계층별 분류

재시작 시나리오가 검증하는 것

시드 분포

Review Point

RP01. 소규모 팀에서 배치 인프라 도입의 현실적 경계

RP02. 스케줄러의 한계를 배치가 아닌 다른 방식으로 풀 수 있었는가

RP03. 이 배치의 실패 시 사용자 영향이 실제로 어느 정도인가

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

운영 관점 주요 지적

추가 테스트 권장 사항

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

ghtjr410 commented Apr 17, 2026 •

edited

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading