Learned gradient compression for distributed deep learning
Introduced a novel compression framework to reduce communication overhead in distributed training. This work enables faster scaling of large-scale models across clusters by optimizing how gradients are exchanged between nodes.