Change bias initialization from 'embed' to 'heads' by csgoogle · Pull Request #371 · AI-Hypercomputer/maxdiffusion

csgoogle · 2026-04-06T10:09:51Z

Fix incorrect logical partitioning axes for attention and feed-forward parameters in Flax/WAN/LTX2 modules.
Refactor flash-attention block-size selection into a helper and add unit tests

doc: https://docs.google.com/document/d/1absFkpQAMM3YaYWxO_FYeqzDpypYeDbPsJRAV86nFQ0/edit?usp=sharing&resourcekey=0-FOzOmM0UdfU1LcDd_7epvw

Results

Metric	`main`	`fixbiassharding`	Δ
Compile time	1913.9s	1906.4s	-7.5s
Inference time	1656.4s	1642.1s	-14.3s (-0.9%)

Notes

No difference observed with tp=1 configs — improvement only surfaces when tensor parallelism is active, as the axis fixes reduce parameter all-gather overhead in MLP layers
Primary motivation for this change is correctness: incorrect sharding axes can cause OOM or numerical issues at other parallelism configs
Larger gains expected at tp=4 or tp=8 where parameter communication is a larger fraction of step time

Video Quality Comparison

Branch	Video
`main`	main.mp4
`fixbiassharding`	fixbiassharding.mp4

PSNR/SSIM (frame-by-frame, 81 frames):

Metric	Mean	Min	Max
PSNR	19.37 dB	18.83	20.17
SSIM	0.7884	0.7654	0.8043

Low PSNR/SSIM reflects floating point non-determinism from different sharding layouts across 50 denoising steps (bfloat16 + different collective patterns) — videos are visually identical.

Video and Xprof after fix:

https://console.cloud.google.com/storage/browser/sagarchapara/shardingfixes

github-actions · 2026-04-06T10:11:10Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

Perseus14 · 2026-04-15T05:34:05Z

Could you add more details and results on the new commits? @csgoogle

csgoogle · 2026-04-15T15:28:12Z

+  raise ValueError(f"Flash attention expects rank-3 or rank-4 inputs, got rank {tensor.ndim}.")
+
+
+def _select_flash_block_sizes(


just refactoring no logical changes and added unit tests, this would be helpful for other ulyesses attention pr

csgoogle · 2026-04-15T18:18:13Z

Could you add more details and results on the new commits? @csgoogle

done

github-actions · 2026-04-15T18:22:50Z

🤖 Hi @csgoogle, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-04-15T18:25:30Z

🤖 I'm sorry @csgoogle, but I was unable to process your request. Please see the logs for more details.

csgoogle requested a review from entrpn as a code owner April 6, 2026 10:09

csgoogle force-pushed the fixbiassharding branch 2 times, most recently from d822acb to 15af39f Compare April 13, 2026 10:41

entrpn previously approved these changes Apr 14, 2026

View reviewed changes

github-actions bot added the pull ready label Apr 14, 2026

csgoogle dismissed entrpn’s stale review via 3331fed April 15, 2026 14:41

csgoogle force-pushed the fixbiassharding branch 5 times, most recently from 9780b17 to 7a6ab88 Compare April 15, 2026 14:59

Fix transformer sharding and cross-attention flash block sizes

2ddf8ab

csgoogle force-pushed the fixbiassharding branch from 20430ab to 2ddf8ab Compare April 15, 2026 15:05

csgoogle removed the pull ready label Apr 15, 2026

csgoogle commented Apr 15, 2026

View reviewed changes

Perseus14 approved these changes Apr 15, 2026

View reviewed changes

entrpn approved these changes Apr 15, 2026

View reviewed changes

csgoogle added the gemini-review label Apr 15, 2026

csgoogle removed the gemini-review label Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change bias initialization from 'embed' to 'heads'#371

Change bias initialization from 'embed' to 'heads'#371
csgoogle wants to merge 1 commit intomainfrom
fixbiassharding

csgoogle commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Perseus14 commented Apr 15, 2026

Uh oh!

csgoogle Apr 15, 2026

Uh oh!

csgoogle commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		raise ValueError(f"Flash attention expects rank-3 or rank-4 inputs, got rank {tensor.ndim}.")


		def _select_flash_block_sizes(

Conversation

csgoogle commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

Notes

Video Quality Comparison

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Perseus14 commented Apr 15, 2026

Uh oh!

csgoogle Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

csgoogle commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

github-actions bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csgoogle commented Apr 6, 2026 •

edited

Loading