Skip to content

Commit 0ef339f

Browse files
Update OpenAI GPT model card (#37255)
* Update OpenAI GPT model card * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update OpenAI GPT model card: add usage examples and notes section * Add API autodoc tags after Notes section for OpenAI GPT model * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/openai-gpt.md Co-authored-by: Steven Liu <[email protected]> * Added missing badges --------- Co-authored-by: Steven Liu <[email protected]>
1 parent 46d7391 commit 0ef339f

File tree

1 file changed

+60
-91
lines changed

1 file changed

+60
-91
lines changed

docs/source/en/model_doc/openai-gpt.md

+60-91
Original file line numberDiff line numberDiff line change
@@ -14,154 +14,123 @@ rendered properly in your Markdown viewer.
1414
1515
-->
1616

17-
# OpenAI GPT
18-
19-
<div class="flex flex-wrap space-x-1">
20-
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
21-
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
22-
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
23-
">
24-
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
25-
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
17+
18+
<div style="float: right;">
19+
<div class="flex flex-wrap space-x-1">
20+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
21+
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
22+
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,...">
23+
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
24+
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
25+
</div>
2626
</div>
2727

28-
## Overview
2928

30-
OpenAI GPT model was proposed in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
31-
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
32-
pre-trained using language modeling on a large corpus with long range dependencies, the Toronto Book Corpus.
3329

34-
The abstract from the paper is the following:
30+
# GPT
3531

36-
*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
37-
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
38-
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
39-
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a
40-
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
41-
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
42-
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
43-
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
44-
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
45-
the state of the art in 9 out of the 12 tasks studied.*
32+
[GPT (Generative Pre-trained Transformer)](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) focuses on effectively learning text representations and transferring them to tasks. This model trains the Transformer decoder to predict the next word, and then fine-tuned on labeled data.
4633

47-
[Write With Transformer](https://transformer.huggingface.co/doc/gpt) is a webapp created and hosted by Hugging Face
48-
showcasing the generative capabilities of several models. GPT is one of them.
34+
GPT can generate high-quality text, making it well-suited for a variety of natural language understanding tasks such as textual entailment, question answering, semantic similarity, and document classification.
4935

50-
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/openai/finetune-transformer-lm).
36+
You can find all the original GPT checkpoints under the [OpenAI community](https://huggingface.co/openai-community/openai-gpt) organization.
5137

52-
## Usage tips
38+
> [!TIP]
39+
> Click on the GPT models in the right sidebar for more examples of how to apply GPT to different language tasks.
5340
54-
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
55-
the left.
56-
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
57-
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
58-
observed in the *run_generation.py* example script.
41+
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.
5942

6043

61-
Note:
6244

63-
If you want to reproduce the original tokenization process of the *OpenAI GPT* paper, you will need to install `ftfy`
64-
and `SpaCy`:
45+
<hfoptions id="usage">
46+
<hfoption id="Pipeline">
6547

66-
```bash
67-
pip install spacy ftfy==4.4.3
68-
python -m spacy download en
48+
49+
```python
50+
import torch
51+
from transformers import pipeline
52+
53+
generator = pipeline(task="text-generation", model="openai-community/gpt", torch_dtype=torch.float16, device=0)
54+
output = generator("The future of AI is", max_length=50, do_sample=True)
55+
print(output[0]["generated_text"])
6956
```
7057

71-
If you don't install `ftfy` and `SpaCy`, the [`OpenAIGPTTokenizer`] will default to tokenize
72-
using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
58+
</hfoption>
59+
<hfoption id="AutoModel">
7360

74-
## Resources
61+
```python
62+
from transformers import AutoModelForCausalLM, AutoTokenizer
7563

76-
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with OpenAI GPT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
64+
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt")
65+
model = AutoModelForCausalLM.from_pretrained("openai-community/openai-gpt", torch_dtype=torch.float16)
7766

78-
<PipelineTag pipeline="text-classification"/>
67+
inputs = tokenizer("The future of AI is", return_tensors="pt")
68+
outputs = model.generate(**inputs, max_length=50)
69+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
70+
```
7971

80-
- A blog post on [outperforming OpenAI GPT-3 with SetFit for text-classification](https://www.philschmid.de/getting-started-setfit).
81-
- See also: [Text classification task guide](../tasks/sequence_classification)
72+
</hfoption>
73+
<hfoption id="transformers-cli">
8274

83-
<PipelineTag pipeline="text-generation"/>
75+
```bash
76+
echo -e "The future of AI is" | transformers-cli run --task text-generation --model openai-community/openai-gpt --device 0
8477

85-
- A blog on how to [Finetune a non-English GPT-2 Model with Hugging Face](https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface).
86-
- A blog on [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate) with GPT-2.
87-
- A blog on [Training CodeParrot 🦜 from Scratch](https://huggingface.co/blog/codeparrot), a large GPT-2 model.
88-
- A blog on [Faster Text Generation with TensorFlow and XLA](https://huggingface.co/blog/tf-xla-generate) with GPT-2.
89-
- A blog on [How to train a Language Model with Megatron-LM](https://huggingface.co/blog/megatron-training) with a GPT-2 model.
90-
- A notebook on how to [finetune GPT2 to generate lyrics in the style of your favorite artist](https://colab.research.google.com/github/AlekseyKorshuk/huggingartists/blob/master/huggingartists-demo.ipynb). 🌎
91-
- A notebook on how to [finetune GPT2 to generate tweets in the style of your favorite Twitter user](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb). 🌎
92-
- [Causal language modeling](https://huggingface.co/course/en/chapter7/6?fw=pt#training-a-causal-language-model-from-scratch) chapter of the 🤗 Hugging Face Course.
93-
- [`OpenAIGPTLMHeadModel`] is supported by this [causal language modeling example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#gpt-2gpt-and-causal-language-modeling), [text generation example script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-generation/run_generation.py) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
94-
- [`TFOpenAIGPTLMHeadModel`] is supported by this [causal language modeling example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_clmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
95-
- See also: [Causal language modeling task guide](../tasks/language_modeling)
78+
```
79+
</hfoption>
80+
</hfoptions>
9681

97-
<PipelineTag pipeline="token-classification"/>
82+
## Notes
9883

99-
- A course material on [Byte-Pair Encoding tokenization](https://huggingface.co/course/en/chapter6/5).
84+
- Inputs should be padded on the right because GPT uses absolute position embeddings.
10085

10186
## OpenAIGPTConfig
10287

10388
[[autodoc]] OpenAIGPTConfig
10489

105-
## OpenAIGPTTokenizer
106-
107-
[[autodoc]] OpenAIGPTTokenizer
108-
- save_vocabulary
109-
110-
## OpenAIGPTTokenizerFast
111-
112-
[[autodoc]] OpenAIGPTTokenizerFast
113-
114-
## OpenAI specific outputs
115-
116-
[[autodoc]] models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
117-
118-
[[autodoc]] models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
119-
120-
<frameworkcontent>
121-
<pt>
122-
12390
## OpenAIGPTModel
12491

12592
[[autodoc]] OpenAIGPTModel
126-
- forward
93+
- forward
12794

12895
## OpenAIGPTLMHeadModel
12996

13097
[[autodoc]] OpenAIGPTLMHeadModel
131-
- forward
98+
- forward
13299

133100
## OpenAIGPTDoubleHeadsModel
134101

135102
[[autodoc]] OpenAIGPTDoubleHeadsModel
136-
- forward
103+
- forward
137104

138105
## OpenAIGPTForSequenceClassification
139106

140107
[[autodoc]] OpenAIGPTForSequenceClassification
141-
- forward
108+
- forward
142109

143-
</pt>
144-
<tf>
110+
## OpenAIGPTTokenizer
111+
112+
[[autodoc]] OpenAIGPTTokenizer
113+
114+
## OpenAIGPTTokenizerFast
115+
116+
[[autodoc]] OpenAIGPTTokenizerFast
145117

146118
## TFOpenAIGPTModel
147119

148120
[[autodoc]] TFOpenAIGPTModel
149-
- call
121+
- call
150122

151123
## TFOpenAIGPTLMHeadModel
152124

153125
[[autodoc]] TFOpenAIGPTLMHeadModel
154-
- call
126+
- call
155127

156128
## TFOpenAIGPTDoubleHeadsModel
157129

158130
[[autodoc]] TFOpenAIGPTDoubleHeadsModel
159-
- call
131+
- call
160132

161133
## TFOpenAIGPTForSequenceClassification
162134

163135
[[autodoc]] TFOpenAIGPTForSequenceClassification
164-
- call
165-
166-
</tf>
167-
</frameworkcontent>
136+
- call

0 commit comments

Comments
 (0)