Skip to content

Fix transformer sharding and cross-attention flash block sizes

2ddf8ab
Select commit
Loading
Failed to load commit list.
Open

Change bias initialization from 'embed' to 'heads' #371

Fix transformer sharding and cross-attention flash block sizes
2ddf8ab
Select commit
Loading
Failed to load commit list.