Skip to content

Latest commit

 

History

History
342 lines (300 loc) · 28.3 KB

Pre-Trained_Language_Model.md

File metadata and controls

342 lines (300 loc) · 28.3 KB

Pre-Trained Language Model (Unsupervised Representation Learning)

References

Benchmark

  • ChineseGLUE: Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard.
  • CLUE: Advanced version of ChineseGLUE (homepage/paper).
  • FewCLUE: paper
  • ChineseBLUE: Chinese Biomedical Language Understanding Evaluation benchmark.
  • CUGE(Chinese LanguageUnderstanding and Generation Evaluation): report/homepage
  • NATURAL-INSTRUCTIONSv2: paper, news

Tokenizer

ELMo

Transformer

Variations of Transformer

FLASH

PaLM

BERT

BERT-WWM (Pre-Training with Whole Word Masking for BERT)

RoBERTa

ALBERT

DeBERTa

GPT

GPT-2

GPT-3

GPT-J

minGPT

GPT-JT

CPM-Generate

ELECTRA

ERNIE

ERNIE 3.0

  • paper: ERNIE 3.0: LARGE-SCALE KNOWLEDGE ENHANCED PRE-TRAINING FOR LANGUAGE UNDERSTANDING AND GENERATION
  • demo: ERNIE 3.0 知识增强大模型

XLNET

Megatron-LM

  • paper: Shoeybi, M. , Patwary, M. , Puri, R. , Legresley, P. , Casper, J. , & Catanzaro, B. . (2019). Megatron-lm: training multi-billion parameter language models using model parallelism.
  • code: https://github.com/NVIDIA/Megatron-LM

LiBai

MASS

  • paper: Kaitao S. , Xu T. , Tao Q. , Jianfeng L. , Tie-Y. L. . (2019). MASS: Masked Sequence to Sequence Pre-training for Language Generation.

UniLM

BART

  • paper: Lewis, M. , Liu, Y. , Goyal, N. , Ghazvininejad M. , Mohamed A. , Levy O. , Stoyanov, V. , & Zettlemoyer, L. . (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.
  • extra: BARTScore

T5

ZEN

Mengzi

  • paper: Zhang, Z., Zhang, H., Chen, K., Guo, Y., Hua, J., Wang, Y., & Zhou, M. (2021). Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese. arXiv preprint arXiv:2110.06696.
  • code: https://github.com/Langboat/Mengzi
  • author: Langboat

NeZha_Chinese_PyTorch

HUAWEI-Pretrained Language Model

UER-py

FARM

fastNLP

AliceMind

WuDao

Fengshenbang-LM

  • github page: https://github.com/IDEA-CCNL/Fengshenbang-LM
  • author: IDEA CCNL
  • note: 封神榜大模型是IDEA研究院认知计算与自然语言研究中心主导的大模型开源计划,包括二郎神(中文BERT)、周文王(与追一共同研发,中文LM&MLM)、余元(中文医疗LM)、闻仲(中文GPT)、燃灯(中文All2Gen)

😵 Why so many huge language models?

  • 2023年03月07日(5620亿参数) PaLM-E by 谷歌: news
  • 2022年07月28日(1760亿参数) BLOOM by BigScienceLLM: news, intro, optimization, tutorial
  • 2022年06月13日(10亿参数) 乾元(BigBang Transformer) by 超对称技术: news, benchmark
  • 2022年05月04日(1750亿参数) OPT-175B by Meta AI: paper, code, model file, news
  • 2022年04月05日(5400亿参数) PaLM by 谷歌: news, intro
  • 2022年02月04日(200亿参数) GPT-NeoX by EleutherAI: news
  • 2022年01月23日(1370亿参数) LaMDA by 谷歌: news, news
  • 2021年12月09日(2800亿参数) 地鼠(Gopher) by DeepMind: news
  • 2021年12月08日(2600亿参数) 文心(ERNIE3.0 Titan) by 百度: news, news
  • 2021年10月12日(5300亿参数) Megatron-Turing by 微软&英伟达: news
  • 2021年09月30日(?亿参数) 神舟1.0 by QQ浏览器: news
  • 2021年09月28日(2457亿参数) 源1.0 by 浪潮人工智能研究院: news.
  • 2021年07月08日(?亿参数) ERNIE3.0 by 百度: paper, demo, news.
  • 2021年06月01日(17500亿参数) 悟道2.0 by 北京智源人工智能研究院: news
  • 2021年04月26日(2000亿参数) 盘古(PanGu) by 华为: code, news.
  • 2021年04月19日(270亿参数) PLUG by 阿里巴巴达摩院: demo, news.
  • 2021年03月20日(?亿参数) 悟道1.0 by 北京智源人工智能研究院: homepage, corpora, news.
  • 2021年03月11日(26/217亿参数) CPM-LM/CPM-KM by 北京智源人工智能研究院: code, homepage, paper.