Converting PyTorch distributed checkpoints to Hugging Face format is extremely painful
8There is no official or well-supported path for converting PyTorch distributed training checkpoints to Hugging Face Transformers-compatible checkpoints. NVIDIA has deprioritized this in favor of their NeMo framework, leaving the community without reliable tooling for this common workflow.
migrationPyTorchHugging Face Transformers