Skip to content

Commit b426e7e

Browse files
committed
Update models README with a title
1 parent 268b3a5 commit b426e7e

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

vec_inf/models/README.md

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
# [Cohere for AI: Command R](https://huggingface.co/collections/CohereForAI/c4ai-command-r-plus-660ec4c34f7a69c50ce7f7b9)
1+
# Available Models
2+
More profiling metrics coming soon!
3+
4+
## [Cohere for AI: Command R](https://huggingface.co/collections/CohereForAI/c4ai-command-r-plus-660ec4c34f7a69c50ce7f7b9)
25

36
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
47
|:----------:|:----------:|:----------:|:----------:|
58
|[`c4ai-command-r-plus`](https://huggingface.co/CohereForAI/c4ai-command-r-plus)| 8x a40 (2 nodes, 4 a40/node) | 412 tokens/s | 541 tokens/s |
69

7-
# [Code Llama](https://huggingface.co/collections/meta-llama/code-llama-family-661da32d0a9d678b6f55b933)
10+
## [Code Llama](https://huggingface.co/collections/meta-llama/code-llama-family-661da32d0a9d678b6f55b933)
811

912
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
1013
|:----------:|:----------:|:----------:|:----------:|
@@ -17,13 +20,13 @@
1720
| [`CodeLlama-70b-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-hf) | 4x a40 | - tokens/s | - tokens/s |
1821
| [`CodeLlama-70b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf) | 4x a40 | - tokens/s | - tokens/s |
1922

20-
# [Databricks: DBRX](https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962)
23+
## [Databricks: DBRX](https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962)
2124

2225
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
2326
|:----------:|:----------:|:----------:|:----------:|
2427
|[`dbrx-instruct`](https://huggingface.co/databricks/dbrx-instruct)| 8x a40 (2 nodes, 4 a40/node) | 107 tokens/s | 904 tokens/s |
2528

26-
# [Google: Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
29+
## [Google: Gemma 2](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
2730

2831
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
2932
|:----------:|:----------:|:----------:|:----------:|
@@ -32,21 +35,21 @@
3235
| [`gemma-2-27b`](https://huggingface.co/google/gemma-2-27b) | 2x a40 | - tokens/s | - tokens/s |
3336
| [`gemma-2-27b-it`](https://huggingface.co/google/gemma-2-27b-it) | 2x a40 | - tokens/s | - tokens/s |
3437

35-
# [LLaVa-1.5](https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0)
38+
## [LLaVa-1.5](https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0)
3639

3740
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
3841
|:----------:|:----------:|:----------:|:----------:|
3942
|[`llava-1.5-7b-hf`](https://huggingface.co/llava-hf/llava-1.5-7b-hf)| 1x a40 | - tokens/s | - tokens/s |
4043
|[`llava-1.5-13b-hf`](https://huggingface.co/llava-hf/llava-1.5-13b-hf)| 1x a40 | - tokens/s | - tokens/s |
4144

42-
# [LLaVa-NeXT](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf)
45+
## [LLaVa-NeXT](https://huggingface.co/collections/llava-hf/llava-next-65f75c4afac77fd37dbbe6cf)
4346

4447
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
4548
|:----------:|:----------:|:----------:|:----------:|
4649
|[`llava-v1.6-mistral-7b-hf`](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf)| 1x a40 | - tokens/s | - tokens/s |
4750
|[`llava-v1.6-34b-hf`](https://huggingface.co/llava-hf/llava-v1.6-34b-hf)| 2x a40 | - tokens/s | - tokens/s |
4851

49-
# [Meta: Llama 2](https://huggingface.co/collections/meta-llama/llama-2-family-661da1f90a9d678b6f55773b)
52+
## [Meta: Llama 2](https://huggingface.co/collections/meta-llama/llama-2-family-661da1f90a9d678b6f55773b)
5053

5154
| Variant | Suggested resource allocation |
5255
|:----------:|:----------:|
@@ -57,7 +60,7 @@
5760
| [`Llama-2-70b-hf`](https://huggingface.co/meta-llama/Llama-2-70b-hf) | 4x a40 |
5861
| [`Llama-2-70b-chat-hf`](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4x a40 |
5962

60-
# [Meta: Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6)
63+
## [Meta: Llama 3](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6)
6164

6265
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
6366
|:----------:|:----------:|:----------:|:----------:|
@@ -66,17 +69,17 @@
6669
| [`Meta-Llama-3-70B`](https://huggingface.co/meta-llama/Meta-Llama-3-70B) | 4x a40 | 81 tokens/s | 618 tokens/s |
6770
| [`Meta-Llama-3-70B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | 4x a40 | 301 tokens/s | 660 tokens/s |
6871

69-
# [Meta: Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
72+
## [Meta: Llama 3.1](https://huggingface.co/collections/meta-llama/llama-31-669fc079a0c406a149a5738f)
7073

7174
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
7275
|:----------:|:----------:|:----------:|:----------:|
7376
| [`Meta-Llama-3.1-8B`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) | 1x a40 | - tokens/s | - tokens/s |
7477
| [`Meta-Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | 1x a40 | - tokens/s | - tokens/s |
7578
| [`Meta-Llama-3.1-70B`](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B) | 4x a40 | - tokens/s | - tokens/s |
7679
| [`Meta-Llama-3.1-70B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) | 4x a40 | - tokens/s | - tokens/s |
77-
| [`Meta-Llama-3.1-405B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 32x a40 | - tokens/s | - tokens/s |
80+
| [`Meta-Llama-3.1-405B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) | 32x a40 (8 nodes, 4 a40/node) | - tokens/s | - tokens/s |
7881

79-
# [Mistral AI: Mistral](https://huggingface.co/mistralai)
82+
## [Mistral AI: Mistral](https://huggingface.co/mistralai)
8083

8184
| Variant (Mistral) | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
8285
|:----------:|:----------:|:----------:|:----------:|
@@ -87,15 +90,15 @@
8790
|[`Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)| 1x a40 | - tokens/s | - tokens/s|
8891
|[`Mistral-Large-Instruct-2407`](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)| 4x a40 | - tokens/s | - tokens/s|
8992

90-
# [Mistral AI: Mixtral](https://huggingface.co/mistralai)
93+
## [Mistral AI: Mixtral](https://huggingface.co/mistralai)
9194

9295
| Variant (Mixtral) | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
9396
|:----------:|:----------:|:----------:|:----------:|
9497
|[`Mixtral-8x7B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)| 4x a40 | 222 tokens/s | 1543 tokens/s |
9598
|[`Mixtral-8x22B-v0.1`](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1)| 8x a40 (2 nodes, 4 a40/node) | 145 tokens/s | 827 tokens/s|
9699
|[`Mixtral-8x22B-Instruct-v0.1`](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1)| 8x a40 (2 nodes, 4 a40/node) | 95 tokens/s | 803 tokens/s|
97100

98-
# [Microsoft: Phi 3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3)
101+
## [Microsoft: Phi 3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3)
99102

100103
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
101104
|:----------:|:----------:|:----------:|:----------:|

0 commit comments

Comments
 (0)