You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-4
Original file line number
Diff line number
Diff line change
@@ -39,20 +39,20 @@ Easy, advanced inference platform for large language models on Kubernetes
39
39
## Key Features
40
40
41
41
-**Easy of Use**: People can quick deploy a LLM service with minimal configurations.
42
-
-**Broad Backends Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
42
+
-**Broad Backends Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [Text-Generation-Inference](https://github.com/huggingface/text-generation-inference), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./site/content/en/docs/integrations/support-backends.md).
43
43
-**Accelerator Fungibility**: llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
44
44
-**Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores. llmaz will automatically handle the model loading, requiring no effort from users.
45
45
-**Multi-Host Support**: llmaz supports both single-host and multi-host scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 0.
46
46
-**AI Gateway Support**: Offering capabilities like token-based rate limiting, model routing with the integration of [Envoy AI Gateway](https://aigateway.envoyproxy.io/).
47
-
-**Build-in ChatUI**: Out-of-the-box chatbot support with the integration of [Open WebUI](https://github.com/open-webui/open-webui), offering capacities like function call, RAG, web search and more, see configurations [here](./docs/open-webui.md).
47
+
-**Build-in ChatUI**: Out-of-the-box chatbot support with the integration of [Open WebUI](https://github.com/open-webui/open-webui), offering capacities like function call, RAG, web search and more, see configurations [here](./site/content/en/docs/integrations/open-webui.md).
48
48
-**Scaling Efficiency**: llmaz supports horizontal scaling with [HPA](./docs/examples/hpa/README.md) by default and will integrate with autoscaling components like [Cluster-Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) or [Karpenter](https://github.com/kubernetes-sigs/karpenter) for smart scaling across different clouds.
49
49
-**Efficient Model Distribution (WIP)**: Out-of-the-box model cache system support with [Manta](https://github.com/InftyAI/Manta), still under development right now with architecture reframing.
50
50
51
51
## Quick Start
52
52
53
53
### Installation
54
54
55
-
Read the [Installation](./docs/installation.md) for guidance.
55
+
Read the [Installation](./site/content/en/docs/installation.md) for guidance.
Please refer to [examples](./docs/examples/README.md) for more tutorials or read [develop.md](./docs/develop.md) to learn more about the project.
129
+
Please refer to [examples](./docs/examples/README.md) for more tutorials or read [develop.md](./site/content/en/docs/develop.md) to learn more about the project.
Copy file name to clipboardExpand all lines: site/content/en/docs/installation.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ description: >
12
12
13
13
- Kubernetes version >= 1.26. LWS requires Kubernetes version **v1.26 or higher**. If you are using a lower Kubernetes version and most of your workloads rely on single-node inference, we may consider replacing LWS with a Deployment-based approach. This fallback plan would involve using Kubernetes Deployments to manage single-node inference workloads efficiently. See [#32](https://github.com/InftyAI/llmaz/issues/32) for more details and updates.
14
14
- Helm 3, see [installation](https://helm.sh/docs/intro/install/).
15
-
- Prometheus, see [installation](https://github.com/InftyAI/llmaz/tree/main/docs/prometheus-operator#install-the-prometheus-operator).
15
+
- Prometheus, see [installation](https://github.com/InftyAI/llmaz/blob/main/site/content/en/docs/integrations/prometheus-operator.md#install-the-prometheus-operator).
16
16
17
17
Note: llmaz helm chart will by default install
18
18
-[Envoy Gateway](https://github.com/envoyproxy/gateway) and [Envoy AI Gateway](https://github.com/envoyproxy/ai-gateway) as the frontier in the llmaz-system, if you *already installed these two components* or *want to deploy in other namespaces* , append `--set envoy-gateway.enabled=false --set envoy-ai-gateway.enabled=false` to the command below.
0 commit comments