You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: noob_intro_transformers.md
+19-12
Original file line number
Diff line number
Diff line change
@@ -7,19 +7,17 @@ authors:
7
7
8
8
# Total noob’s intro to Hugging Face Transformers
9
9
10
-
Welcome to "A Total Noob’s Introduction to Hugging Face Transformers," a guide designed specifically for those looking to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and how it works, not to turn you into a machine learning practitioner, but to enable better understanding of and collaboration with those who are. That being said, the best way to learn is by doing, so we'll walk through a simple worked example of running Google’s new Gemma 2B LLM in a notebook in a Hugging Face space.
10
+
Welcome to "A Total Noob’s Introduction to Hugging Face Transformers," a guide designed specifically for those looking to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and how it works, not to turn you into a machine learning practitioner, but to enable better understanding of and collaboration with those who are. That being said, the best way to learn is by doing, so we'll walk through a simple worked example of running Microsoft’s Phi-2 LLM in a notebook on a Hugging Face space.
11
11
12
12
You might wonder, with the abundance of tutorials on Hugging Face already available, why create another? The answer lies in accessibility: most existing resources assume some technical background, including Python proficiency, which can prevent non-technical individuals from grasping ML fundamentals. As someone who came from the business side of AI, I recognize that the learning curve presents a barrier and wanted to offer a more approachable path for like-minded learners.
13
13
14
14
Therefore, this guide is tailored for a non-technical audience keen to better understand open-source machine learning without having to learn Python from scratch. We assume no prior knowledge and will explain concepts from the ground up to ensure clarity. If you're an engineer, you’ll find this guide a bit basic, but for beginners, it's an ideal starting point.
15
15
16
-
If you want to continue your ML learning journey after you follow this tutorial, I recommend the recent [Hugging Face course](https://www.deeplearning.ai/short-courses/open-source-models-hugging-face/) we released in partnership with DeepLearning AI.
17
-
18
16
Let’s get stuck in… but first some context.
19
17
20
18
## What is Hugging Face Transformers?
21
19
22
-
Hugging Face Transformers is an open-source Python library that provides access to thousands of pre-trained Transformers models for natural language processing (NLP), computer vision, audio tasks, and more. It simplifies the process of implementing and deploying Transformer models by abstracting away the complexity of training or deploying models in lower level ML frameworks like PyTorch, TensorFlow and JAX.
20
+
Hugging Face Transformers is an open-source Python library that provides access to thousands of pre-trained Transformers models for natural language processing (NLP), computer vision, audio tasks, and more. It simplifies the process of implementing Transformer models by abstracting away the complexity of training or deploying models in lower level ML frameworks like PyTorch, TensorFlow and JAX.
23
21
24
22
## What is a library?
25
23
@@ -83,7 +81,7 @@ A Docker template is a predefined blueprint for a software environment that incl
83
81
84
82
By default, our Space comes with a complimentary CPU, which is fine for some applications. However, the many computations required by LLMs benefit significantly from being run in parallel to improve speed, which is something GPUs are great at.
85
83
86
-
It's also important to choose a GPU with enough memory to store the model and providing spare working memory. In our case, an A10G Small with 24GB is enough for Gemma 2B.
84
+
It's also important to choose a GPU with enough memory to store the model and providing spare working memory. In our case, an A10G Small with 24GB is enough for Phi-2.
@@ -145,15 +143,15 @@ Although Transformers is already installed, the specific Classes within Transfor
145
143
9. Define which model you want to run
146
144
- To detail the model you want to download and run from the Hugging Face Hub, you need to specify the name of the model repo in your code
147
145
- We do this by setting a variable equal to the model name, in this case we decide to call the variable `model_id`
148
-
- We’ll use a non-gated version of Gemma 2B instruction tuned model which can be found at https://huggingface.co/alpindale/gemma-2b-it this saves us an extra step of having to authenticate your Hugging Face account in the code
146
+
- We’ll use Microsoft's Phi-2, a small but surprisingly capable model which can be found at https://huggingface.co/microsoft/phi-2. Note: Phi-2 is a base not an instruction tuned model and so will respond unusually if you try to use it for chat
149
147
150
148
```json
151
-
model_id = "alpindale/gemma-2b-it"
149
+
model_id = "microsoft/phi-2"
152
150
```
153
151
154
152
## What is an instruction tuned model?
155
153
156
-
An instruction-tuned language model is a type of model that has been further trained from its base version to understand and respond to commands or prompts given by a user, improving its ability to follow instructions. Base models are able to autocomplete text, but often don’t respond to commands in a useful way.
154
+
An instruction-tuned language model is a type of model that has been further trained from its base version to understand and respond to commands or prompts given by a user, improving its ability to follow instructions. Base models are able to autocomplete text, but often don’t respond to commands in a useful way. We'll see this later when we try to prompt Phi.
157
155
158
156
10. Create a model object and load the model
159
157
- To load the model from the Hugging Face Hub into our local environment we need to instantiate the model object. We do this by passing the “model_id” which we defined in the last step into the argument of the “.from_pretrained” method on the AutoModelForCausalLM Class.
A tokenizer is a tool that splits sentences into smaller pieces of text (tokens) and assigns each token a numeric value called an input id. This is needed because our model only understands numbers, so we first must convert (a.k.a encode) the text into a format the model can understand. Each model has it’s own tokenizer vocabulary, it’s important to use the same tokenizer that the model was trained on or it will misinterpret the text.
185
183
186
184
12. Create the inputs for the model to process
187
-
- Define a new variable `input_text` that will take the prompt you want to give the model
185
+
- Define a new variable `input_text` that will take the prompt you want to give the model. In this case I asked "Who are you?" but you can choose whatever you prefer.
188
186
- Pass the new variable as an argument to the tokenizer object to create the `input_ids`
189
187
- Pass a second argument to the tokenizer object, `return_tensors="pt"`, this ensures the token_id is represented as the correct kind of vector for the model version we are using (i.e. in Pytorch not Tensorflow)
- Now the input in the right format we need to pass it into the model, we do this by calling the `.generate` method on the `model object` passing the `input_ids` as an argument and assigning it to a new variable `outputs`
195
+
- Now the input in the right format we need to pass it into the model, we do this by calling the `.generate` method on the `model object` passing the `input_ids` as an argument and assigning it to a new variable `outputs`. We also set a second argument `max_new_tokens` equal to 100, this limts the number of tokens the model will generate.
198
196
- The outputs are not human readable yet, to return them to text we must decode the output. We can do this with the `.decode` method and saving that to the variable `decoded_outputs`
199
197
- Finally, passing the `decoded_output` variable into the print function allows us to see the model output in our notebook.
200
198
- Optional: Pass the `outputs` variable into the print function to see how they compare to the `decoded outputs`
Remember that the model only understands numbers, so when we provided our `input_ids` as vectors it returned an output in the same format. To return those outputs to text we need to reverse the initial encoding we did using the tokenizer.
208
+
Models only understand numbers, so when we provided our `input_ids` as vectors it returned an output in the same format. To return those outputs to text we need to reverse the initial encoding we did using the tokenizer.
209
+
210
+
## Why does the output not make sense?
211
+
212
+
Remember that Phi-2 is a base model that hasn't been instruction tuned for conversational uses, as such it's effectively a massive auto-complete model. Based on your input it is predicting what it thinks is most likely to come next based on all the web pages, books and other content it has seen previously.
213
+
214
+
Congratulations, you've run inference on your very first LLM!
215
+
216
+
I hope that working through this example helped you to better understand the world of open-source ML. If you want to continue your ML learning journey, I recommend the recent [Hugging Face course](https://www.deeplearning.ai/short-courses/open-source-models-hugging-face/) we released in partnership with DeepLearning AI.
0 commit comments