Pre-Trained Language Model (Unsupervised Representation Learning)

References

PLMpapers: a list of the representative work on Pre-trained Languge Model.
Book: Representation Learning for Natural Language Processing.
OpenCLaP: Open Chinese Language Pre-trained Model Zoo.
OpenVINO: a toolkit allowing developers to deploy pre-trained deep learning models through a high-level C++ Inference Engine API integrated with application logic.
Kashgari: a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
Awesome Pretrained Chinese NLP Models: 高质量中文预训练模型集合.
Chinese-Minority-PLM: 少数民族语言预训练模型.
ColossalAI: a unified deep learning system for big model era.
OpenBMB: a list of big models.
flagOpen: 智源(BAAI)开源项目汇总。
A Cookbook of Self-Supervised Learning: LeCun 70页长篇巨作！自监督学习「葵花宝典」，手把手教你学会

Benchmark

ChineseGLUE: Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard.
CLUE: Advanced version of ChineseGLUE (homepage/paper).
FewCLUE: paper
ChineseBLUE: Chinese Biomedical Language Understanding Evaluation benchmark.
CUGE(Chinese LanguageUnderstanding and Generation Evaluation): report/homepage
NATURAL-INSTRUCTIONSv2: paper, news

Tokenizer

Pinyin Tokenizer
- link: https://github.com/shibing624/pinyin-tokenizer
- author: xuming
- note: 使用python3开发的中文拼音分词器，将连续的拼音切分为单字拼音列表，开箱即用。
tiktoken
- link: https://github.com/openai/tiktoken
- author: OpenAI
- note: a fast BPE tokeniser for use with OpenAI's models.
easytokenizer
- link: https://github.com/zejunwang1/easytokenizer
- author: WangZeJun
- note: 一个简单易用的高性能文本 Tokenizer 库，支持类似 HuggingFace transformers 中 BertTokenizer 的词语切分和标记化功能。
- blog: easytokenizer-v0.2.0: 高性能文本 Tokenizer 库

ELMo

paper: Samuel R. , Ellie P. , Edouard G. , Benjamin Van D. , Alex W. , Jan H. , Patrick X. , Raghavendra P. , R. T. M. , Roma P. , Najoung K. , Ian T. , Yinghui H. , Katherin Y. , Shuning J. , Berlin C. . (2018). Looking for ELMo's Friends: Sentence-Level Pretraining Beyond Language Modeling.
code: Origin by ML²AT CILVR Report, Tutorial by Prashant Ranjan, keras by iliaschalkidis, Multi-Language-oriented ELMo by HIT-SCIR, another keras version by strongio.

Transformer

paper: Vaswani, A. , Shazeer, N. , Parmar, N. , Uszkoreit, J. , Jones, L. , & Gomez, A. N. , et al. (2017). Attention is all you need.
code:
attention:
- External-Attention-pytorch: pytorch implementation of various attention mechanisms, mlp, re-parameter, convolution, which is helpful to further understand papers.
survey:
introduction/tutorial:

Variations of Transformer

Fast Transformers
- paper: Katharopoulos, A. , Vyas, A. , Pappas, N. , Fleuret, F. . (2020). Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention.
- code: fast-transformers by Idiap Research Institute
- website: https://linear-transformers.com/
Flowformer
- code: https://github.com/thuml/Flowformer
- author: THUML
- paper: Wu, H. , Wu, J. , Xu, J. , Wang, J. , & Long, M. . (2022). Flowformer: linearizing transformers with conservation flows.
- blog: 任务通用！清华提出主干网络Flowformer，实现线性复杂度｜ICML2022
Infinite Memory Transformer
- paper: Martins, P. H., Marinho, Z., & Martins, A. F. (2021). $\infty $-former: Infinite Memory Transformer. arXiv preprint arXiv:2109.00301.
- blog: Transformer又出新变体∞-former：无限长期记忆，任意长度上下文.
Longformer
- paper: Beltagy, I. , Peters, M.E. , Cohan, A. . (2020). Longformer: The Long-Document Transformer.
- code: longformer by allenai
ReFormer
- paper: Kitaev, N. , Kaiser, U. , & Levskaya, A. . (2020). Reformer: the efficient Transformer.
- code: reformer-pytorch by Phil Wang
RoFormer
- RoFormer: Transformer升级之路：2、博采众长的旋转式位置编码
- RoFormer(v2): RoFormerV2：自然语言理解的极限探索
Transformer-XL
- paper: Dai, Z. , Yang, Z. , Yang, Y. , Carbonell, J. , Le, Q. V. , & Salakhutdinov, R. . (2019). Transformer-xl: attentive language models beyond a fixed-length context.
- code: tensorflow & pytorch by Zhilin Yang
xFormers
- code: https://github.com/facebookresearch/xformers
- author: Facebook Research
- note: hackable and optimized Transformers building blocks, supporting a composable construction.

FLASH

paper: Hua, W., Dai, Z., Liu, H., & Le, Q. V. (2022). Transformer Quality in Linear Time. arXiv preprint arXiv:2202.10447.
blog
- FLASH：可能是近来最有意思的高效Transformer设计
- 谷歌Quoc Le团队新transformer：线性可扩展，训练成本仅有原版1/12

PaLM

paper: PaLM: Scaling Language Modeling with Pathways.
paper: Pathways: Asynchronous Distributed Dataflow for ML.
code: PaLM - Pytorch by Phil Wang
blog

BERT

paper: Devlin, J. , Chang, M. W. , Lee, K. , & Toutanova, K. . (2018). Bert: pre-training of deep bidirectional transformers for language understanding.
code:
list: awesome-bert by Jiakui Wang
pre-trained models: OpenCLaP
extra:
- GuwenBERT
- PERT: PERT：一种基于乱序语言模型的预训练模型
blog:
- 2023年：语言模型预训练基础知识总结：标准数据流pipleline、tokenizer的认识以及常见编码模型范式

BERT-WWM (Pre-Training with Whole Word Masking for BERT)

link: Chinese-BERT-wwm by Yiming Cui.

RoBERTa

paper: Liu, Y. , Ott, M. , Goyal, N. , Du, J. , Joshi, M. , Chen, D. , Levy, O. , Lewis, M. , Zettlemoyer L. , Stoyanov V. . (2019). A Robustly Optimized BERT Pretraining Approach
code:
- pytorch by facebook
- roberta_zh by brightmart

ALBERT

paper: ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS
code:

DeBERTa

paper
- DeBERTa: Decoding-enhanced BERT with Disentangled Attention
- DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
code:
- DeBERTa

GPT

paper: Alec R. , Karthik N. , Tim S. , & Ilya S. . (2018). Improving Language Understanding by Generative Pre-Training.
code: tensorflow by OpenAI
blog:

GPT-2

paper: Alec R. , Jeffrey W. , Rewon C. , David L. , Dario A. , Ilya S. . (2019). Language Models are Unsupervised Multitask Learners.
code:
extra: another unofficial tensorflow version by ConnorJL and the author's blog.
tutorial:
- GPT2.0 解读 Language Models are Unsupervised Multitask Learners
- The Illustrated GPT-2 (Visualizing Transformer Language Models)

GPT-3

paper: OpenAI . (2020). Language Models are Few-Shot Learners.
code: descriptions by OpenAI.
blog:

GPT-J

code: https://github.com/kingoflolz/mesh-transformer-jax

minGPT

code: https://github.com/karpathy/minGPT

GPT-JT

blog:
- Releasing v1 of GPT-JT powered by open-source AI

CPM-Generate

code: https://github.com/TsinghuaAI/CPM-Generate
author: Tsinghua AI & BAAI
homepage: https://cpm.baai.ac.cn/
paper: Wang, X. , Gao, T. , Zhu, Z. , Liu, Z. , Li, J. , & Tang, J. . (2019). Kepler: a unified model for knowledge embedding and pre-trained language representation.

ELECTRA

paper: Kevin, C. , Minh-Thang L. , Quoc V.lE. , Christopher D. Manning. . (2020). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
code: tensorflow by Google, Chinese-ELECTRA by Yiming Cui

ERNIE

paper: Zhang, Z. , Han, X. , Liu, Z. , Jiang, X. , Sun, M. , & Liu, Q. . (2019). Ernie: enhanced language representation with informative entities.
code: pytorch by thunlp, paddlepaddle by Bai Du

ERNIE 3.0

paper: ERNIE 3.0: LARGE-SCALE KNOWLEDGE ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING AND GENERATION
demo: ERNIE 3.0 知识增强大模型

XLNET

paper: Yang, Z. , Dai, Z. , Yang Y. , Carbonell, J. , Salakhutdinov R. , V. Le, Q. . (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding .
code: https://github.com/zihangdai/xlnet
extra: Chinese-XLNet by Yiming Cui

Megatron-LM

paper: Shoeybi, M. , Patwary, M. , Puri, R. , Legresley, P. , Casper, J. , & Catanzaro, B. . (2019). Megatron-lm: training multi-billion parameter language models using model parallelism.
code: https://github.com/NVIDIA/Megatron-LM

LiBai

link: https://github.com/Oneflow-Inc/libai
author: OneFlow
blog: 大模型训练之难，难于上青天？预训练易用、效率超群的「李白」模型库来了！

MASS

paper: Kaitao S. , Xu T. , Tao Q. , Jianfeng L. , Tie-Y. L. . (2019). MASS: Masked Sequence to Sequence Pre-training for Language Generation.

UniLM

paper: Dong, L. , Yang, N. , Wang, W. , Wei, F. , Liu, X. , & Wang, Y. , et al. (2019). Unified language model pre-training for natural language understanding and generation. NeurIPS 2019.
code: https://github.com/microsoft/unilm
note: including UniLM v1/v2, MiniLM, LayoutLM, and s2s-ft.
extra: Unilm(Chinese) by YuwenTechnology, Pretrained-Unilm-Chinese by zhongerqiandan

BART

paper: Lewis, M. , Liu, Y. , Goyal, N. , Ghazvininejad M. , Mohamed A. , Levy O. , Stoyanov, V. , & Zettlemoyer, L. . (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
extra: BARTScore

T5

code
paper: Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.
tutorial
extra:
- mT5:
  - paper: mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer.
  - code
- ByT5:
  - paper: ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models.
- TabT5:
  - paper: Table-To-Text generation and pre-training with TabT5.
- Flan-T5:
  - paper: Scaling Instruction-Finetuned Language Models.
  - blog:
    - 2022年：谷歌Flan-T5诞生！1800种语言任务超大规模微调
    - 2022年：30亿跑赢GPT-3的1750亿，谷歌新模型引热议，然而却把Hinton年龄搞错了

ZEN

paper: Diao, S. , Bai, J. , Song, Y. , Zhang, T. , & Wang, Y. . (2019). Zen: pre-training chinese text encoder enhanced by n-gram representations.
code: https://github.com/sinovation/ZEN

Mengzi

paper: Zhang, Z., Zhang, H., Chen, K., Guo, Y., Hua, J., Wang, Y., & Zhou, M. (2021). Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv preprint arXiv:2110.06696.
code: https://github.com/Langboat/Mengzi
author: Langboat

NeZha_Chinese_PyTorch

code: https://github.com/lonePatient/NeZha_Chinese_PyTorch
author: lonePatient

HUAWEI-Pretrained Language Model

code: https://github.com/huawei-noah/Pretrained-Language-Model

UER-py

code: https://github.com/dbiir/UER-py
author: DBIIR @ RUC
note: open source pre-training model framework in pytorch & pre-trained model zoo.

FARM

code: https://github.com/deepset-ai/FARM
author: deepset-ai
note: tool makes Transfer Learning with BERT & Co simple, fast and enterprise-ready.

fastNLP

code: https://github.com/fastnlp/fastNLP
document: https://fastnlp.readthedocs.io/zh/latest/
author: fastnlp group (FengZiYjun, fudan)
note: a modularized and extensible nlp framework, currently still in incubation.
extra: fastHan: 基于BERT的中文NLP集成工具(fastHan)
news: 邱锡鹏：用fastNLP快速搭建自然语言处理模型（时间10.17）

AliceMind

code: https://github.com/alibaba/AliceMind/
author: alibaba-luofuli
note: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
news: 官宣！达摩院开源秘藏深度语言模型体系AliceMind，NLP正在走向大工业时代

WuDao

github page: https://github.com/BAAI-WuDao
author: BAAI
note: 智源·悟道大规模预训练语言模型

Fengshenbang-LM

github page: https://github.com/IDEA-CCNL/Fengshenbang-LM
author: IDEA CCNL
note: 封神榜大模型是IDEA研究院认知计算与自然语言研究中心主导的大模型开源计划，包括二郎神（中文BERT）、周文王（与追一共同研发，中文LM&MLM）、余元（中文医疗LM）、闻仲（中文GPT）、燃灯（中文All2Gen）

😵 Why so many huge language models?

2023年03月07日（5620亿参数） PaLM-E by 谷歌: news
2022年07月28日（1760亿参数） BLOOM by BigScienceLLM: news, intro, optimization, tutorial
2022年06月13日（10亿参数） 乾元(BigBang Transformer) by 超对称技术: news, benchmark
2022年05月04日（1750亿参数） OPT-175B by Meta AI: paper, code, model file, news
2022年04月05日（5400亿参数） PaLM by 谷歌: news, intro
2022年02月04日（200亿参数） GPT-NeoX by EleutherAI: news
2022年01月23日（1370亿参数） LaMDA by 谷歌: news, news
2021年12月09日（2800亿参数） 地鼠（Gopher） by DeepMind: news
2021年12月08日（2600亿参数） 文心（ERNIE3.0 Titan） by 百度: news, news
2021年10月12日（5300亿参数） Megatron-Turing by 微软&英伟达: news
2021年09月30日（?亿参数） 神舟1.0 by QQ浏览器: news
2021年09月28日（2457亿参数） 源1.0 by 浪潮人工智能研究院: news.
2021年07月08日（?亿参数） ERNIE3.0 by 百度: paper, demo, news.
2021年06月01日（17500亿参数） 悟道2.0 by 北京智源人工智能研究院: news
2021年04月26日（2000亿参数） 盘古（PanGu） by 华为: code, news.
2021年04月19日（270亿参数） PLUG by 阿里巴巴达摩院: demo, news.
2021年03月20日（?亿参数） 悟道1.0 by 北京智源人工智能研究院: homepage, corpora, news.
2021年03月11日（26/217亿参数） CPM-LM/CPM-KM by 北京智源人工智能研究院: code, homepage, paper.

Files

Pre-Trained_Language_Model.md

Latest commit

History

Pre-Trained_Language_Model.md

File metadata and controls

Pre-Trained Language Model (Unsupervised Representation Learning)

References

Benchmark

Tokenizer

ELMo

Transformer

Variations of Transformer

FLASH

PaLM

BERT

BERT-WWM (Pre-Training with Whole Word Masking for BERT)

RoBERTa

ALBERT

DeBERTa

GPT

GPT-2

GPT-3

GPT-J

minGPT

GPT-JT

CPM-Generate

ELECTRA

ERNIE

ERNIE 3.0

XLNET

Megatron-LM

LiBai

MASS

UniLM

BART

T5

ZEN

Mengzi

NeZha_Chinese_PyTorch

HUAWEI-Pretrained Language Model

UER-py

FARM

fastNLP

AliceMind

WuDao

Fengshenbang-LM

😵 Why so many huge language models?