Skip to content

Update OpenAI GPT model card #37255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 4, 2025
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8428849
Update OpenAI GPT model card
linnettuscano Apr 3, 2025
1ab17b3
Merge branch 'main' into update-openai-model-card
linnettuscano Apr 3, 2025
eee6df4
Merge branch 'main' into update-openai-model-card
linnettuscano Apr 3, 2025
5dcac69
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 3, 2025
47fe814
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 3, 2025
dda7d46
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 3, 2025
18e552d
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 3, 2025
df2c4bb
Update OpenAI GPT model card: add usage examples and notes section
linnettuscano Apr 3, 2025
4e8ac3e
Merge branch 'main' into update-openai-model-card
linnettuscano Apr 4, 2025
4f95804
Merge branch 'main' into update-openai-model-card
linnettuscano Apr 4, 2025
41650dd
Add API autodoc tags after Notes section for OpenAI GPT model
linnettuscano Apr 4, 2025
7e7d7ea
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
daa5fa1
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
47734cc
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
cfefebb
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
cd1a1be
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
54ea8ab
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
16ea53e
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
29eaa91
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
5c828bf
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
f051476
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
b7c28a2
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
c98479e
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
430c05e
Update docs/source/en/model_doc/openai-gpt.md
linnettuscano Apr 4, 2025
1cdc085
Added missing badges
linnettuscano Apr 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 60 additions & 91 deletions docs/source/en/model_doc/openai-gpt.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,154 +14,123 @@ rendered properly in your Markdown viewer.

-->

# OpenAI GPT

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">

<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=data:image/png;base64,...">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
</div>
</div>

## Overview

OpenAI GPT model was proposed in [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
pre-trained using language modeling on a large corpus with long range dependencies, the Toronto Book Corpus.

The abstract from the paper is the following:
# GPT

*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
the state of the art in 9 out of the 12 tasks studied.*
[GPT (Generative Pre-trained Transformer)](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) focuses on effectively learning text representations and transferring them to tasks. This model trains the Transformer decoder to predict the next word, and then fine-tuned on labeled data.

[Write With Transformer](https://transformer.huggingface.co/doc/gpt) is a webapp created and hosted by Hugging Face
showcasing the generative capabilities of several models. GPT is one of them.
GPT can generate high-quality text, making it well-suited for a variety of natural language understanding tasks such as textual entailment, question answering, semantic similarity, and document classification.

This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/openai/finetune-transformer-lm).
You can find all the original GPT checkpoints under the [OpenAI community](https://huggingface.co/openai-community/openai-gpt) organization.

## Usage tips
> [!TIP]
> Click on the GPT models in the right sidebar for more examples of how to apply GPT to different language tasks.

- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
observed in the *run_generation.py* example script.
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line.


Note:

If you want to reproduce the original tokenization process of the *OpenAI GPT* paper, you will need to install `ftfy`
and `SpaCy`:
<hfoptions id="usage">
<hfoption id="Pipeline">

```bash
pip install spacy ftfy==4.4.3
python -m spacy download en

```python
import torch
from transformers import pipeline

generator = pipeline(task="text-generation", model="openai-community/gpt", torch_dtype=torch.float16, device=0)
output = generator("The future of AI is", max_length=50, do_sample=True)
print(output[0]["generated_text"])
```

If you don't install `ftfy` and `SpaCy`, the [`OpenAIGPTTokenizer`] will default to tokenize
using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
</hfoption>
<hfoption id="AutoModel">

## Resources
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with OpenAI GPT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt")
model = AutoModelForCausalLM.from_pretrained("openai-community/openai-gpt", torch_dtype=torch.float16)

<PipelineTag pipeline="text-classification"/>
inputs = tokenizer("The future of AI is", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

- A blog post on [outperforming OpenAI GPT-3 with SetFit for text-classification](https://www.philschmid.de/getting-started-setfit).
- See also: [Text classification task guide](../tasks/sequence_classification)
</hfoption>
<hfoption id="transformers-cli">

<PipelineTag pipeline="text-generation"/>
```bash
echo -e "The future of AI is" | transformers-cli run --task text-generation --model openai-community/openai-gpt --device 0

- A blog on how to [Finetune a non-English GPT-2 Model with Hugging Face](https://www.philschmid.de/fine-tune-a-non-english-gpt-2-model-with-huggingface).
- A blog on [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate) with GPT-2.
- A blog on [Training CodeParrot 🦜 from Scratch](https://huggingface.co/blog/codeparrot), a large GPT-2 model.
- A blog on [Faster Text Generation with TensorFlow and XLA](https://huggingface.co/blog/tf-xla-generate) with GPT-2.
- A blog on [How to train a Language Model with Megatron-LM](https://huggingface.co/blog/megatron-training) with a GPT-2 model.
- A notebook on how to [finetune GPT2 to generate lyrics in the style of your favorite artist](https://colab.research.google.com/github/AlekseyKorshuk/huggingartists/blob/master/huggingartists-demo.ipynb). 🌎
- A notebook on how to [finetune GPT2 to generate tweets in the style of your favorite Twitter user](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb). 🌎
- [Causal language modeling](https://huggingface.co/course/en/chapter7/6?fw=pt#training-a-causal-language-model-from-scratch) chapter of the 🤗 Hugging Face Course.
- [`OpenAIGPTLMHeadModel`] is supported by this [causal language modeling example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#gpt-2gpt-and-causal-language-modeling), [text generation example script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/text-generation/run_generation.py) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb).
- [`TFOpenAIGPTLMHeadModel`] is supported by this [causal language modeling example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_clmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
- See also: [Causal language modeling task guide](../tasks/language_modeling)
```
</hfoption>
</hfoptions>

<PipelineTag pipeline="token-classification"/>
## Notes

- A course material on [Byte-Pair Encoding tokenization](https://huggingface.co/course/en/chapter6/5).
- Inputs should be padded on the right because GPT uses absolute position embeddings.

## OpenAIGPTConfig

[[autodoc]] OpenAIGPTConfig

## OpenAIGPTTokenizer

[[autodoc]] OpenAIGPTTokenizer
- save_vocabulary

## OpenAIGPTTokenizerFast

[[autodoc]] OpenAIGPTTokenizerFast

## OpenAI specific outputs

[[autodoc]] models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput

[[autodoc]] models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput

<frameworkcontent>
<pt>

## OpenAIGPTModel

[[autodoc]] OpenAIGPTModel
- forward
- forward

## OpenAIGPTLMHeadModel

[[autodoc]] OpenAIGPTLMHeadModel
- forward
- forward

## OpenAIGPTDoubleHeadsModel

[[autodoc]] OpenAIGPTDoubleHeadsModel
- forward
- forward

## OpenAIGPTForSequenceClassification

[[autodoc]] OpenAIGPTForSequenceClassification
- forward
- forward

</pt>
<tf>
## OpenAIGPTTokenizer

[[autodoc]] OpenAIGPTTokenizer

## OpenAIGPTTokenizerFast

[[autodoc]] OpenAIGPTTokenizerFast

## TFOpenAIGPTModel

[[autodoc]] TFOpenAIGPTModel
- call
- call

## TFOpenAIGPTLMHeadModel

[[autodoc]] TFOpenAIGPTLMHeadModel
- call
- call

## TFOpenAIGPTDoubleHeadsModel

[[autodoc]] TFOpenAIGPTDoubleHeadsModel
- call
- call

## TFOpenAIGPTForSequenceClassification

[[autodoc]] TFOpenAIGPTForSequenceClassification
- call

</tf>
</frameworkcontent>
- call