- PLMpapers: a list of the representative work on Pre-trained Languge Model.
- Book: Representation Learning for Natural Language Processing.
- OpenCLaP: Open Chinese Language Pre-trained Model Zoo.
- OpenVINO: a toolkit allowing developers to deploy pre-trained deep learning models through a high-level C++ Inference Engine API integrated with application logic.
- Kashgari: a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
- Awesome Pretrained Chinese NLP Models: 高质量中文预训练模型集合.
- Chinese-Minority-PLM: 少数民族语言预训练模型.
- ColossalAI: a unified deep learning system for big model era.
- OpenBMB: a list of big models.
- flagOpen: 智源(BAAI)开源项目汇总。
- A Cookbook of Self-Supervised Learning: LeCun 70页长篇巨作!自监督学习「葵花宝典」,手把手教你学会
- ChineseGLUE: Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard.
- CLUE: Advanced version of ChineseGLUE (homepage/paper).
- FewCLUE: paper
- ChineseBLUE: Chinese Biomedical Language Understanding Evaluation benchmark.
- CUGE(Chinese LanguageUnderstanding and Generation Evaluation): report/homepage
- NATURAL-INSTRUCTIONSv2: paper, news
- Pinyin Tokenizer
- link: https://github.com/shibing624/pinyin-tokenizer
- author: xuming
- note: 使用python3开发的中文拼音分词器,将连续的拼音切分为单字拼音列表,开箱即用。
- tiktoken
- link: https://github.com/openai/tiktoken
- author: OpenAI
- note: a fast BPE tokeniser for use with OpenAI's models.
- easytokenizer
- link: https://github.com/zejunwang1/easytokenizer
- author: WangZeJun
- note: 一个简单易用的高性能文本 Tokenizer 库,支持类似 HuggingFace transformers 中 BertTokenizer 的词语切分和标记化功能。
- blog: easytokenizer-v0.2.0: 高性能文本 Tokenizer 库
- paper: Samuel R. , Ellie P. , Edouard G. , Benjamin Van D. , Alex W. , Jan H. , Patrick X. , Raghavendra P. , R. T. M. , Roma P. , Najoung K. , Ian T. , Yinghui H. , Katherin Y. , Shuning J. , Berlin C. . (2018). Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling.
- code: Origin by ML²AT CILVR Report, Tutorial by Prashant Ranjan, keras by iliaschalkidis, Multi-Language-oriented ELMo by HIT-SCIR, another keras version by strongio.
- paper: Vaswani, A. , Shazeer, N. , Parmar, N. , Uszkoreit, J. , Jones, L. , & Gomez, A. N. , et al. (2017). Attention is all you need.
- code:
- attention:
- External-Attention-pytorch: pytorch implementation of various attention mechanisms, mlp, re-parameter, convolution, which is helpful to further understand papers.
- survey:
- 2023年:Transformer速查宝典:模型、架构、训练方法的论文都在这里了
- 2023年:图与代码不一致,Transformer论文被发现错误,网友:早该被指出1000次
- 2023年:Transformer models: an introduction and catalog: ChatGPT背后的大模型最新有哪些?最新最全《Transformer预训练模型分类》论文,36页pdf详述大模型技术目录
- 2022年:Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey
- 2022年:从SwinTransformer到GFlowNets,我们从2021年2万份SOTA工作中选了256个最值得关注的(附完整名录)
- 2021年:自然语言处理:基于预训练模型的方法(哈工大SCIR 车万翔、郭江、崔一鸣著)
- 2021年:国内数十位NLP大佬合作,综述预训练模型的过去、现在与未来
- 2021年:复旦大学邱锡鹏教授团队:Transformer最新综述
- 2021年:视觉Transformer最新综述
- Long Range Arena: A Benchmark for Efficient Transformers (2020-11)
- Efficient Transformers: A Survey (2020-09)
- introduction/tutorial:
- 2024年:TRANSFORMERS FROM SCRATCH with code.
- 2023年:How-to-use-Transformers
- 2022年:transformer-walkthrough: a walkthrough of transformer architecture code.
- 2022年:简单实现 BERT
- 2022年:技术解读:BERT语言模型的预训练源码浅析与总结
- 2022年:Bert系列之模型参数计算
- 2022年:矩阵视角下的Transformer详解(附代码)
- 2021年:HuggingFace BERT源码详解:基本模型组件实现
- 2021年:Huggingface BERT源码详解:应用模型与训练优化
- 2021年:详解Transformer
- 2021年:3W字长文带你轻松入门视觉transformer,备用地址
- 2020年:How Transformers work in deep learning and NLP: an intuitive introduction by AI SUMMER
- 2018年:The Annotated Transformer
- 2018年:真正的完全图解Seq2Seq Attention模型
- Fast Transformers
- paper: Katharopoulos, A. , Vyas, A. , Pappas, N. , Fleuret, F. . (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.
- code: fast-transformers by Idiap Research Institute
- website: https://linear-transformers.com/
- Flowformer
- code: https://github.com/thuml/Flowformer
- author: THUML
- paper: Wu, H. , Wu, J. , Xu, J. , Wang, J. , & Long, M. . (2022). Flowformer: linearizing transformers with conservation flows.
- blog: 任务通用!清华提出主干网络Flowformer,实现线性复杂度|ICML2022
- Infinite Memory Transformer
- Longformer
- paper: Beltagy, I. , Peters, M.E. , Cohan, A. . (2020). Longformer: The Long-Document Transformer.
- code: longformer by allenai
- ReFormer
- paper: Kitaev, N. , Kaiser, U. , & Levskaya, A. . (2020). Reformer: the efficient Transformer.
- code: reformer-pytorch by Phil Wang
- RoFormer
- Transformer-XL
- paper: Dai, Z. , Yang, Z. , Yang, Y. , Carbonell, J. , Le, Q. V. , & Salakhutdinov, R. . (2019). Transformer-xl: attentive language models beyond a fixed-length context.
- code: tensorflow & pytorch by Zhilin Yang
- xFormers
- code: https://github.com/facebookresearch/xformers
- author: Facebook Research
- note: hackable and optimized Transformers building blocks, supporting a composable construction.
- paper: Hua, W., Dai, Z., Liu, H., & Le, Q. V. (2022). Transformer Quality in Linear Time. arXiv preprint arXiv:2202.10447.
- blog
- paper: PaLM: Scaling Language Modeling with Pathways.
- paper: Pathways: Asynchronous Distributed Dataflow for ML.
- code: PaLM - Pytorch by Phil Wang
- blog
- paper: Devlin, J. , Chang, M. W. , Lee, K. , & Toutanova, K. . (2018). Bert: pre-training of deep bidirectional transformers for language understanding.
- code:
- list: awesome-bert by Jiakui Wang
- pre-trained models: OpenCLaP
- extra:
- blog:
- link: Chinese-BERT-wwm by Yiming Cui.
- paper: Liu, Y. , Ott, M. , Goyal, N. , Du, J. , Joshi, M. , Chen, D. , Levy, O. , Lewis, M. , Zettlemoyer L. , Stoyanov V. . (2019). A Robustly Optimized BERT Pretraining Approach
- code:
- paper: ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS
- code:
- paper
- code:
- paper: Alec R. , Karthik N. , Tim S. , & Ilya S. . (2018). Improving Language Understanding by Generative Pre-Training.
- code: tensorflow by OpenAI
- blog:
- paper: Alec R. , Jeffrey W. , Rewon C. , David L. , Dario A. , Ilya S. . (2019). Language Models are Unsupervised Multitask Learners.
- code:
- extra: another unofficial tensorflow version by ConnorJL and the author's blog.
- tutorial:
- paper: OpenAI . (2020). Language Models are Few-Shot Learners.
- code: descriptions by OpenAI.
- blog:
- code: https://github.com/TsinghuaAI/CPM-Generate
- author: Tsinghua AI & BAAI
- homepage: https://cpm.baai.ac.cn/
- paper: Wang, X. , Gao, T. , Zhu, Z. , Liu, Z. , Li, J. , & Tang, J. . (2019). Kepler: a unified model for knowledge embedding and pre-trained language representation.
- paper: Kevin, C. , Minh-Thang L. , Quoc V.lE. , Christopher D. Manning. . (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- code: tensorflow by Google, Chinese-ELECTRA by Yiming Cui
- paper: Zhang, Z. , Han, X. , Liu, Z. , Jiang, X. , Sun, M. , & Liu, Q. . (2019). Ernie: enhanced language representation with informative entities.
- code: pytorch by thunlp, paddlepaddle by Bai Du
- paper: ERNIE 3.0: LARGE-SCALE KNOWLEDGE ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING AND GENERATION
- demo: ERNIE 3.0 知识增强大模型
- paper: Yang, Z. , Dai, Z. , Yang Y. , Carbonell, J. , Salakhutdinov R. , V. Le, Q. . (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding .
- code: https://github.com/zihangdai/xlnet
- extra: Chinese-XLNet by Yiming Cui
- paper: Shoeybi, M. , Patwary, M. , Puri, R. , Legresley, P. , Casper, J. , & Catanzaro, B. . (2019). Megatron-lm: training multi-billion parameter language models using model parallelism.
- code: https://github.com/NVIDIA/Megatron-LM
- link: https://github.com/Oneflow-Inc/libai
- author: OneFlow
- blog: 大模型训练之难,难于上青天?预训练易用、效率超群的「李白」模型库来了!
- paper: Kaitao S. , Xu T. , Tao Q. , Jianfeng L. , Tie-Y. L. . (2019). MASS: Masked Sequence to Sequence Pre-training for Language Generation.
- paper: Dong, L. , Yang, N. , Wang, W. , Wei, F. , Liu, X. , & Wang, Y. , et al. (2019). Unified language model pre-training for natural language understanding and generation. NeurIPS 2019.
- code: https://github.com/microsoft/unilm
- note: including UniLM v1/v2, MiniLM, LayoutLM, and s2s-ft.
- extra: Unilm(Chinese) by YuwenTechnology, Pretrained-Unilm-Chinese by zhongerqiandan
- paper: Lewis, M. , Liu, Y. , Goyal, N. , Ghazvininejad M. , Mohamed A. , Levy O. , Stoyanov, V. , & Zettlemoyer, L. . (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
- extra: BARTScore
- code
- paper: Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.
- tutorial
- extra:
- paper: Diao, S. , Bai, J. , Song, Y. , Zhang, T. , & Wang, Y. . (2019). Zen: pre-training chinese text encoder enhanced by n-gram representations.
- code: https://github.com/sinovation/ZEN
- paper: Zhang, Z., Zhang, H., Chen, K., Guo, Y., Hua, J., Wang, Y., & Zhou, M. (2021). Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv preprint arXiv:2110.06696.
- code: https://github.com/Langboat/Mengzi
- author: Langboat
- code: https://github.com/lonePatient/NeZha_Chinese_PyTorch
- author: lonePatient
- code: https://github.com/dbiir/UER-py
- author: DBIIR @ RUC
- note: open source pre-training model framework in pytorch & pre-trained model zoo.
- code: https://github.com/deepset-ai/FARM
- author: deepset-ai
- note: tool makes Transfer Learning with BERT & Co simple, fast and enterprise-ready.
- code: https://github.com/fastnlp/fastNLP
- document: https://fastnlp.readthedocs.io/zh/latest/
- author: fastnlp group (FengZiYjun, fudan)
- note: a modularized and extensible nlp framework, currently still in incubation.
- extra: fastHan: 基于BERT的中文NLP集成工具(fastHan)
- news: 邱锡鹏:用fastNLP快速搭建自然语言处理模型(时间10.17)
- code: https://github.com/alibaba/AliceMind/
- author: alibaba-luofuli
- note: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
- news: 官宣!达摩院开源秘藏深度语言模型体系AliceMind,NLP正在走向大工业时代
- github page: https://github.com/BAAI-WuDao
- author: BAAI
- note: 智源·悟道大规模预训练语言模型
- github page: https://github.com/IDEA-CCNL/Fengshenbang-LM
- author: IDEA CCNL
- note: 封神榜大模型是IDEA研究院认知计算与自然语言研究中心主导的大模型开源计划,包括二郎神(中文BERT)、周文王(与追一共同研发,中文LM&MLM)、余元(中文医疗LM)、闻仲(中文GPT)、燃灯(中文All2Gen)
- 2023年03月07日(5620亿参数) PaLM-E by 谷歌: news
- 2022年07月28日(1760亿参数) BLOOM by BigScienceLLM: news, intro, optimization, tutorial
- 2022年06月13日(10亿参数) 乾元(BigBang Transformer) by 超对称技术: news, benchmark
- 2022年05月04日(1750亿参数) OPT-175B by Meta AI: paper, code, model file, news
- 2022年04月05日(5400亿参数) PaLM by 谷歌: news, intro
- 2022年02月04日(200亿参数) GPT-NeoX by EleutherAI: news
- 2022年01月23日(1370亿参数) LaMDA by 谷歌: news, news
- 2021年12月09日(2800亿参数) 地鼠(Gopher) by DeepMind: news
- 2021年12月08日(2600亿参数) 文心(ERNIE3.0 Titan) by 百度: news, news
- 2021年10月12日(5300亿参数) Megatron-Turing by 微软&英伟达: news
- 2021年09月30日(?亿参数) 神舟1.0 by QQ浏览器: news
- 2021年09月28日(2457亿参数) 源1.0 by 浪潮人工智能研究院: news.
- 2021年07月08日(?亿参数) ERNIE3.0 by 百度: paper, demo, news.
- 2021年06月01日(17500亿参数) 悟道2.0 by 北京智源人工智能研究院: news
- 2021年04月26日(2000亿参数) 盘古(PanGu) by 华为: code, news.
- 2021年04月19日(270亿参数) PLUG by 阿里巴巴达摩院: demo, news.
- 2021年03月20日(?亿参数) 悟道1.0 by 北京智源人工智能研究院: homepage, corpora, news.
- 2021年03月11日(26/217亿参数) CPM-LM/CPM-KM by 北京智源人工智能研究院: code, homepage, paper.