A curated list of Model Merging methods: Combining different pre-trained models.
Contributions are welcome!
Acknowledgments: This wonderful template is from https://github.com/VainF/Awesome-Anything by Gongfan Fang.
Title & Authors | Intro | Useful Links |
---|---|---|
Model Fusion via Optimal Transport Sidak Pal Singh, Martin Jaggi > ETH Zurich, EPFL > NeurIPS'20 |
[Github] [PDF] |
|
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa > University of Washington > ICLR'23 |
||
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li > Alibaba Group > arXiv'23 |
[Github] [PDF] |
|
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao > Wuhan University, JD, MBZUAI, Nanyang Technological University > ICML'24 |
[PDF] | |
GAN Cocktail: Mixing GANs without Dataset Access Omri Avrahami, Dani Lischinski, Ohad Fried > The Hebrew University of Jerusalem, Reichman University > ECCV'22 |
[Github] [PDF] |
|
ZipIt! Merging Models from Different Tasks without Training George Stoica, Daniel Bolya, Jakob Bjorner, Taylor Hearn, Judy Hoffman > Georgia Tech > arXiv'23 |
[Github] [PDF] |
|
Model Soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt > University of Washington, Columbia University, Google Research, Meta AI Research, Tel Aviv University > ICML'22 |
[Github] [PDF] |
|
Deep Model Reassembly Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang > National University of Singapore, Bytedance > NeurIPS'22 |
[Github] [PDF] |
|
Factorizing Knowledge in Neural Networks Xingyi Yang, Jingwen Ye, Xinchao Wang > National University of Singapore > ECCV'22 |
[Github] [PDF] |
|
REPAIR: REnormalizing Permuted Activations for Interpolation Repair Keller Jordan, Hanie Sedghi, Olga Saukh, Rahim Entezari, Behnam Neyshabur > Hive AI, Google Research, TU Graz / CSH Vienna > ICLR'23 |
[Github] [PDF] |
|
An Empirical Study of Multimodal Model Merging Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang > UNC Chapel Hill, Microsoft > arXiv'23 |
[Github] [PDF] |
|
Deep Neural Network Fusion via Graph Matching with Applications to Model Ensemble and Federated Learning Chang Liu, Chenfei Lou, Runzhong Wang, Alan Yuhan Xi, Li Shen, Junchi Yan > Shanghai Jiao Tong University, University of Wisconsin Madison, JD Explore Academy, Shanghai AI Laboratory > ICML'22 |
[Github] [PDF] |
|
Merging Models with Fisher-Weighted Averaging Michael Matena, Colin Raffel > UNC Chapel Hill > arXiv'21 |
[Github] [PDF] |
|
Re-basin via implicit Sinkhorn differentiation Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Masih Aminbeidokhti, Eric Granger, Marco Pedersoli > ÉTS > CVPR'23 |
[Github] [PDF] |
|
Dataless Knowledge Fusion by Merging Weights of Language Models Xisen Jin, Xiang Ren, Daniel Preotiuc-Pietro, Pengxiang Cheng > University of Southern California, Bloomberg > ICLR'23 |
[Github] [PDF] |
|
Amalgamating Knowledge From Heterogeneous Graph Neural Networks Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, Dacheng Tao > The University of Sydney, National University of Singapore, Stevens Institute of Technology, Zhejiang University > CVPR'21 |
[Github] [PDF] |
|
Amalgamating Knowledge towards Comprehensive Classification Chengchao Shen, Xinchao Wang, Jie Song, Li Sun, Mingli Song > Zhejiang University, Stevens Institute of Technology > AAAI'19 |
[Github] [PDF] |
|
Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song > Zhejiang University, Stevens Institute of Technology, Alibaba Group, Yunnan University > IJCAI'19 |
[Github] [PDF] |
|
Student Becoming the Master: Knowledge Amalgamation for Joint Scene Parsing, Depth Estimation, and More Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, Mingli Song > Zhejiang University, Stevens Institute of Technology, Alibaba Group, Yunnan University > CVPR'19 |
[Github] [PDF] |
|
Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation Chengchao Shen, Mengqi Xue, Xinchao Wang, Jie Song, Li Sun, Mingli Song > Zhejiang University, Stevens Institute of Technology > ICCV'19 |
[Github] [PDF] |
|
Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, Mingli Song > Zhejiang University, Stevens Institute of Technology, Alibaba Group > CVPR'20 |
[Github] [PDF] |
|
CNN LEGO: Disassembling and Assembling Convolutional Neural Network Jiacong Hu, Jing Gao, Zunlei Feng, Lechao Cheng, Jie Lei, Hujun Bao, Mingli Song > Zhejiang University > arXiv'22 |
[Github] [PDF] |
|
Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning Sihui Luo, Wenwen Pan, Xinchao Wang, Dazhou Wang, Haihong Tang, Mingli Song > Zhejiang University, Stevens Institute of Technology, Alibaba Group > ECCV'20 |
[Github] [PDF] |
|
Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers Jingwen Ye, Xinchao Wang, Yixin Ji, Kairi Ou, Mingli Song > Zhejiang University, Stevens Institute of Technology, Alibaba Group > IJCAI'19 |
[Github] [PDF] |
|
Deep Model Fusion: A Survey Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen > National University of Defense Technology, JD Explore Academy, Beijing Institute of Technology > arXiv'23 |
[Github] [PDF] |
|
... (TBD) |