Add gemma4-26b E2E test scripts for pre-training by chiajunglien · Pull Request #4278 · AI-Hypercomputer/maxtext

chiajunglien · 2026-06-26T06:05:47Z

Description

This PR introduces end-to-end (E2E) testing and validation pipelines for the Gemma 4 26B model in MaxText.
Blocked by: https://buganizer.corp.google.com/issues/527749229

Tests

Pre-Training:

export RUN_ID=$(date +%Y-%m-%d-%H-%M)
bash tests/end_to_end/tpu/gemma4/26b/test_gemma4_to_mt.sh $RUN_ID
bash tests/end_to_end/tpu/gemma4/26b/test_gemma4.sh $RUN_ID

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

…-26b Introduces E2E test configurations and scripts for gemma4-26b model, covering both inference decoding and pre-training validation pipelines.

RexBearIU · 2026-06-26T08:45:50Z

+# Non-Googlers please remember to point `BASE_OUTPUT_DIRECTORY` to the GCS paths where you want to store scanned and unscanned checkpoints
+BASE_OUTPUT_DIRECTORY=gs://runner-maxtext-logs/${MODEL_NAME}/to_maxtext
+
+# Step 1: Install torch


Please test whether we still need it.

Using the command below to test:

xpk workload create \ --cluster=mesa-v6e32-eu \ --workload=gemma4-logit-check-$(date +%m%d%H%M) \ --device-type=v6e-32 \ --num-slices=1 \ --priority=high \ --project=cienet-cmcs \ --zone=europe-west4-a \ --skip-validation \ --docker-image=gcr.io/tpu-prod-env-multipod/maxtext_jax_nightly:28210168847 \ --command="set -xue; \ python3 -m tests.utils.forward_pass_logit_checker \ load_parameters_path='gs://us-central1-emmalien-test-83df3ecd-bucket/gemma4-26b-0625/unscanned/2026-06-25-06-21-12/0/items' \ model_name=gemma4-26b \ use_multimodal=false \ scan_layers=false \ --hf_model_path=google/gemma-4-26b-a4b-it \ --max_kl_div=0.03 \ --run_hf_model=true \ hardware=cpu \ skip_jax_distributed_system=True"

and got the error:

[transformers] PyTorch was not found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. [transformers] DeepseekV32Config got `key=rope_scaling` in kwargs but hasn't set it as attribute. For RoPE standardization you need to set `self.rope_parameters` in model's config. Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/deps/tests/utils/forward_pass_logit_checker.py", line 50, in <module> from maxtext.checkpoint_conversion.utils.hf_utils import convert_jax_weight_to_torch File "/deps/src/maxtext/checkpoint_conversion/utils/hf_utils.py", line 25, in <module> import torch.nn.functional as F ModuleNotFoundError: No module named 'torch' XPK End: Fri Jun 26 09:08:08 UTC 2026 EXIT_CODE=1

Seems it still need torch to be installed.

feat(testing): add end-to-end training and inference tests for gemma4…

1ee3197

…-26b Introduces E2E test configurations and scripts for gemma4-26b model, covering both inference decoding and pre-training validation pipelines.

This was referenced Jun 26, 2026

feat: add Gemma 4 26B E2E pre-training pipeline GoogleCloudPlatform/ml-auto-solutions#1296

Closed

feat: add Gemma 4 26B E2E pre-training pipeline CIeNET-International/ml-auto-solutions-3#286

Open

chiajunglien added 2 commits June 26, 2026 07:49

fix

fd387a7

add post-trainging e2e scripts

8bcfa63

RexBearIU reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gemma4-26b E2E test scripts for pre-training#4278

Add gemma4-26b E2E test scripts for pre-training#4278
chiajunglien wants to merge 3 commits into
AI-Hypercomputer:mainfrom
CIeNET-International:emma/e2e-training

chiajunglien commented Jun 26, 2026 •

edited

Loading

Uh oh!

RexBearIU Jun 26, 2026

Uh oh!

chiajunglien Jun 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chiajunglien commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

RexBearIU Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

chiajunglien Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chiajunglien commented Jun 26, 2026 •

edited

Loading

chiajunglien Jun 26, 2026 •

edited

Loading