When you run AlphaFold 3, you may encounter a GPU out-of-memory (OOM) failures during model execution. The job terminated with errors similar to:
Can't reduce memory use below 84.23GiB (90442527737 bytes) by rematerialization; only reduced to 93.01GiB (99871627676 bytes), down from 93.82GiB (100740973948 bytes) ...
Allocator (GPU_0_bfc) ran out of memory trying to allocate 90.01GiB (rounded to 96651808768)requested by op
Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 96651808528 bytes.
Cause
AlphaFold 3 performs large tensor operations that require substantial GPU memory. By default, XLA preallocates most of the GPU device memory to improve performance. On hardware with limited VRAM, this can lead to:
- XLA GPU memory preallocation consuming the majority of device memory, leaving insufficient free space for model execution.
- Insufficient memory rematerialization capability, where XLA attempts to trade compute for memory but still cannot reduce usage below required thresholds.
- Lack of unified memory, meaning computation must fit entirely within GPU VRAM rather than using host memory as overflow.
Because AlphaFold 3 has very high peak memory requirements, these default settings cause early OOM termination.
Resolution
Adjusting XLA memory management settings within the Apptainer environment resolved the OOM issue.
The following variables were added to the job script:
# Enable unified memory with larger host-to-GPU memory oversubscription
export APPTAINERENV_TF_FORCE_UNIFIED_MEMORY=1
# Disable XLA's default GPU preallocation behavior
export APPTAINERENV_XLA_PYTHON_CLIENT_PREALLOCATE=false
# Increase unified memory oversubscription limit (default is 0.7)
export APPTAINERENV_XLA_CLIENT_MEM_FRACTION=4.0
Effect of these settings
TF_FORCE_UNIFIED_MEMORY=1
Enables unified memory, allowing XLA to spill to host RAM instead of failing when GPU memory is exceeded.
XLA_PYTHON_CLIENT_PREALLOCATE=false
Prevents XLA from reserving most GPU memory upfront, leaving room for dynamic allocations.
XLA_CLIENT_MEM_FRACTION=4.0
Expands the unified memory oversubscription factor so the process can utilize host RAM as needed.
Reference
https://github.com/google-deepmind/alphafold3/issues/432#issuecomment-3094741132