|
1 |
| -# [Cohere for AI: Command R](https://huggingface.co/collections/CohereForAI/c4ai-command-r-plus-660ec4c34f7a69c50ce7f7b9) |
| 1 | +# Available Models |
| 2 | +More profiling metrics coming soon! |
| 3 | + |
| 4 | +## [Cohere for AI: Command R](https://huggingface.co/collections/CohereForAI/c4ai-command-r-plus-660ec4c34f7a69c50ce7f7b9) |
2 | 5 |
|
3 | 6 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
4 | 7 | |:----------:|:----------:|:----------:|:----------:|
|
5 | 8 | |[`c4ai-command-r-plus`](https://huggingface.co/CohereForAI/c4ai-command-r-plus)| 8x a40 (2 nodes, 4 a40/node) | 412 tokens/s | 541 tokens/s |
|
6 | 9 |
|
7 |
| -# [Code Llama](https://huggingface.co/collections/meta-llama/code-llama-family-661da32d0a9d678b6f55b933) |
| 10 | +## [Code Llama](https://huggingface.co/collections/meta-llama/code-llama-family-661da32d0a9d678b6f55b933) |
8 | 11 |
|
9 | 12 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
10 | 13 | |:----------:|:----------:|:----------:|:----------:|
|
|
17 | 20 | | [`CodeLlama-70b-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-hf) | 4x a40 | - tokens/s | - tokens/s |
|
18 | 21 | | [`CodeLlama-70b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf) | 4x a40 | - tokens/s | - tokens/s |
|
19 | 22 |
|
20 |
| -# [Databricks: DBRX](https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962) |
| 23 | +## [Databricks: DBRX](https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962) |
21 | 24 |
|
22 | 25 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
23 | 26 | |:----------:|:----------:|:----------:|:----------:|
|
24 | 27 | |[`dbrx-instruct`](https://huggingface.co/databricks/dbrx-instruct)| 8x a40 (2 nodes, 4 a40/node) | 107 tokens/s | 904 tokens/s |
|
25 | 28 |
|
26 |
| -# [Google: Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) |
| 29 | +## [Google: Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315) |
27 | 30 |
|
28 | 31 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
29 | 32 | |:----------:|:----------:|:----------:|:----------:|
|
|
32 | 35 | | [`gemma-2-27b`](https://huggingface.co/google/gemma-2-27b) | 2x a40 | - tokens/s | - tokens/s |
|
33 | 36 | | [`gemma-2-27b-it`](https://huggingface.co/google/gemma-2-27b-it) | 2x a40 | - tokens/s | - tokens/s |
|
34 | 37 |
|
35 |
| -# [LLaVa-1.5](https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0) |
| 38 | +## [LLaVa-1.5](https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0) |
36 | 39 |
|
37 | 40 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
38 | 41 | |:----------:|:----------:|:----------:|:----------:|
|
39 | 42 | |[`llava-1.5-7b-hf`](https://huggingface.co/llava-hf/llava-1.5-7b-hf)| 1x a40 | - tokens/s | - tokens/s |
|
40 | 43 | |[`llava-1.5-13b-hf`](https://huggingface.co/llava-hf/llava-1.5-13b-hf)| 1x a40 | - tokens/s | - tokens/s |
|
41 | 44 |
|
42 |
| -# [LLaVa-NeXT](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf) |
| 45 | +## [LLaVa-NeXT](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf) |
43 | 46 |
|
44 | 47 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
45 | 48 | |:----------:|:----------:|:----------:|:----------:|
|
46 | 49 | |[`llava-v1.6-mistral-7b-hf`](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)| 1x a40 | - tokens/s | - tokens/s |
|
47 | 50 | |[`llava-v1.6-34b-hf`](https://huggingface.co/llava-hf/llava-v1.6-34b-hf)| 2x a40 | - tokens/s | - tokens/s |
|
48 | 51 |
|
49 |
| -# [Meta: Llama 2](https://huggingface.co/collections/meta-llama/llama-2-family-661da1f90a9d678b6f55773b) |
| 52 | +## [Meta: Llama 2](https://huggingface.co/collections/meta-llama/llama-2-family-661da1f90a9d678b6f55773b) |
50 | 53 |
|
51 | 54 | | Variant | Suggested resource allocation |
|
52 | 55 | |:----------:|:----------:|
|
|
57 | 60 | | [`Llama-2-70b-hf`](https://huggingface.co/meta-llama/Llama-2-70b-hf) | 4x a40 |
|
58 | 61 | | [`Llama-2-70b-chat-hf`](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4x a40 |
|
59 | 62 |
|
60 |
| -# [Meta: Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6) |
| 63 | +## [Meta: Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6) |
61 | 64 |
|
62 | 65 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
63 | 66 | |:----------:|:----------:|:----------:|:----------:|
|
|
66 | 69 | | [`Meta-Llama-3-70B`](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | 4x a40 | 81 tokens/s | 618 tokens/s |
|
67 | 70 | | [`Meta-Llama-3-70B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 4x a40 | 301 tokens/s | 660 tokens/s |
|
68 | 71 |
|
69 |
| -# [Meta: Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) |
| 72 | +## [Meta: Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f) |
70 | 73 |
|
71 | 74 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
72 | 75 | |:----------:|:----------:|:----------:|:----------:|
|
73 | 76 | | [`Meta-Llama-3.1-8B`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 1x a40 | - tokens/s | - tokens/s |
|
74 | 77 | | [`Meta-Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 1x a40 | - tokens/s | - tokens/s |
|
75 | 78 | | [`Meta-Llama-3.1-70B`](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 4x a40 | - tokens/s | - tokens/s |
|
76 | 79 | | [`Meta-Llama-3.1-70B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 4x a40 | - tokens/s | - tokens/s |
|
77 |
| -| [`Meta-Llama-3.1-405B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 32x a40 | - tokens/s | - tokens/s | |
| 80 | +| [`Meta-Llama-3.1-405B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 32x a40 (8 nodes, 4 a40/node) | - tokens/s | - tokens/s | |
78 | 81 |
|
79 |
| -# [Mistral AI: Mistral](https://huggingface.co/mistralai) |
| 82 | +## [Mistral AI: Mistral](https://huggingface.co/mistralai) |
80 | 83 |
|
81 | 84 | | Variant (Mistral) | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
82 | 85 | |:----------:|:----------:|:----------:|:----------:|
|
|
87 | 90 | |[`Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)| 1x a40 | - tokens/s | - tokens/s|
|
88 | 91 | |[`Mistral-Large-Instruct-2407`](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)| 4x a40 | - tokens/s | - tokens/s|
|
89 | 92 |
|
90 |
| -# [Mistral AI: Mixtral](https://huggingface.co/mistralai) |
| 93 | +## [Mistral AI: Mixtral](https://huggingface.co/mistralai) |
91 | 94 |
|
92 | 95 | | Variant (Mixtral) | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
93 | 96 | |:----------:|:----------:|:----------:|:----------:|
|
94 | 97 | |[`Mixtral-8x7B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)| 4x a40 | 222 tokens/s | 1543 tokens/s |
|
95 | 98 | |[`Mixtral-8x22B-v0.1`](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1)| 8x a40 (2 nodes, 4 a40/node) | 145 tokens/s | 827 tokens/s|
|
96 | 99 | |[`Mixtral-8x22B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)| 8x a40 (2 nodes, 4 a40/node) | 95 tokens/s | 803 tokens/s|
|
97 | 100 |
|
98 |
| -# [Microsoft: Phi 3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) |
| 101 | +## [Microsoft: Phi 3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) |
99 | 102 |
|
100 | 103 | | Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
|
101 | 104 | |:----------:|:----------:|:----------:|:----------:|
|
|
0 commit comments