Update READMEs badges and links (#1)

chunnienc · web-flow · commit 62576a9f36c9 · 2024-05-14T00:53:09.000Z
* init

* Update README.md

* Update README.md

* Update README.md
diff --git a/README.md b/README.md
@@ -80,7 +80,7 @@ PyPi Package (Linx)    | [![](https://github.com/google-ai-edge/ai-edge-torch/ac
  * Python versions:  3.9, 3.10, 3.11
  * Operating system: Linux
  * PyTorch: ![torch](https://img.shields.io/badge/torch-2.4.0.dev20240429-blue)
- * TensorFlow: [![tf-nightly](https://img.shields.io/badge/tf--nightly-2.17.0.dev20240430-blue)](https://pypi.org/project/tf-nightly/)
+ * TensorFlow: [![tf-nightly](https://img.shields.io/badge/tf--nightly-2.17.0.dev20240509-blue)](https://pypi.org/project/tf-nightly/)
 
 <!-- requirement badges are updated by ci/update_nightly_versions.py -->
 
diff --git a/ai_edge_torch/generative/README.md b/ai_edge_torch/generative/README.md
@@ -57,13 +57,13 @@ Once you re-author the model and validate its numerical accuracy, you can conver
 For example, in the `generative/examples/test_models/toy_model_with_kv_cache.py`, you can define inputs for both signatures:
 
 Sample inputs for the `prefill` signature:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L105-L108
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L105-L108
 
 Sample inputs for the `decode` signature:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L111-L114
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L111-L114
 
 Then export the model to TFLite with:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/1791dec62f1d3f60e7fe52138640d380f58b072d/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L133-L139
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/test_models/toy_model_with_kv_cache.py#L133-L139
 
 Please note that using the `prefill` and `decode` method conventions are required for easy integration into the Mediapipe LLM Inference API.
 <br/>
@@ -78,7 +78,7 @@ The user needs to implement the entire LLM Pipeline themselves, and call TFLite
 
 This approach provides users with the most control. For example, they can implement streaming, get more control over system memory or implement advanced features such as constrained grammar decoding, speculative decoding etc.
 
-A very simple text generation pipeline based on a decoder-only-transformer is provided [here](https://github.com/google-ai-edge/ai-edge-torch-archive/blob/main/ai_edge_torch/generative/examples/c%2B%2B/text_generator_main.cc) for reference. Note that this example serves as a starting point, and users are expected to implement their own pipelines based on their model's specific requirements.
+A very simple text generation pipeline based on a decoder-only-transformer is provided [here](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/c%2B%2B/text_generator_main.cc) for reference. Note that this example serves as a starting point, and users are expected to implement their own pipelines based on their model's specific requirements.
 
 #### Use MediaPipe LLM Inference API
 
@@ -105,7 +105,7 @@ model-explorer 'gemma-2b.tflite'
 
 <img width="890" alt="Gemma-2b visualization demo" src="screenshots/gemma-tflite.png">
 
-For an end-to-end example showing how to author, convert, quantize and execute, please refer to the steps [here](https://github.com/google-ai-edge/ai-edge-torch-archive/blob/main/ai_edge_torch/generative/examples/README.md)
+For an end-to-end example showing how to author, convert, quantize and execute, please refer to the steps [here](https://github.com/google-ai-edge/ai-edge-torch/blob/main/ai_edge_torch/generative/examples/README.md)
 <br/>
 
 ## What to expect
diff --git a/ai_edge_torch/generative/examples/README.md b/ai_edge_torch/generative/examples/README.md
@@ -22,10 +22,10 @@ For each of the example models, we have a model definition file (e.g. tiny_llama
 Here we use `TinyLlama` as an example to walk you through the authoring steps.
 
 #### Define model's structure
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L43-L74
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L46-L77
 
 #### Define model's forward function
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L79-L101
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L79-L104
 
 Now, you will have an `nn.Module` named `TinyLlama`, the next step is to restore the weights from orginal checkpoint into the new model.
 
@@ -37,12 +37,12 @@ place to simplify the `state_dict` mapping process (`utilities/loader.py`).
 The user needs to provide a layer name tempelate (TensorNames) for the source
 model. This tempelate is then used to create an updated `state_dict` that works
 with the mapped model. The tensor map includes the following fields:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/utilities/loader.py#L120-L134
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/utilities/loader.py#L94-L109
 
 The fields that have a default value of `None` are optional and should only be
 populated if they are relevant to the model architecture. For `TinyLlama`, we
 will define the following `TENSOR_NAMES`:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/e54638dd4a91ec09115f9ded1bd5540f3f1a4e68/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L27-L40
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/tiny_llama.py#L30-L43
 
 With the `TensorNames` defined, a user can simply use the loading utils to load
 an instance of the mapped model. For instance:
@@ -59,7 +59,7 @@ using a few input samples before proceeding to the conversion step.
 
 ### Model conversion
 In this step, we use the `ai_edge_torch`'s standard multi-signature conversion API to convert PyTorch `nn.Module` to a single TFLite flatbuffer for on-device execution. For example, in `tiny_llama/convert_to_tflite.py`, we use this python code to convert the `TinyLLama` model to a multi-signature TFLite model:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py#L22-L53
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples/tiny_llama/convert_to_tflite.py#L26-L61
 
 Once converted, you will get a `.tflite` model which will be ready for on-device execution. Note that the `.tflite` model generated uses static shapes. Inside the generated `.tflite` model, there will be two signatures defined (two entrypoints to the model):
 1) `prefill`: taking 2 tensor inputs `prefill_tokens`, `prefill_input_pos`. With shape `(BATCH_SIZE, PREFILL_SEQ_LEN)` and `(PREFILL_SEQ_LEN)`.
diff --git a/ai_edge_torch/generative/layers/README.md b/ai_edge_torch/generative/layers/README.md
@@ -43,4 +43,4 @@ Currently, the library provides the following configuration class for you to cus
 
 ## High-Level function boundary for performance
 We introduce High-Level Function Boundary (HLFB) as a way of annotating performance-critical pieces of the model (e.g. `scaled_dot_product_attention`, or `KVCache`). HLFB allows the converter to lower the annotated blocks to performant TFLite custom ops. Following is an example of applying HLFB to `SDPA`:
-https://github.com/google-ai-edge/ai-edge-torch-archive/blob/3b753d80fdf00872baac523dc727b87b3dc271e7/ai_edge_torch/generative/layers/attention.py#L74-L122
+https://github.com/google-ai-edge/ai-edge-torch/blob/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/layers/attention.py#L74-L122