Skip to content

Commit 2c43a25

Browse files
authored
Merge pull request #7 from VectorInstitute/develop
Add CodeLlama
2 parents 635e13f + 17bfdb0 commit 2c43a25

File tree

4 files changed

+24
-7
lines changed

4 files changed

+24
-7
lines changed

models/codellama/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# [Code Llama](https://huggingface.co/collections/meta-llama/code-llama-family-661da32d0a9d678b6f55b933)
2+
3+
| Variant | Suggested resource allocation | Avg prompt throughput | Avg generation throughput |
4+
|:----------:|:----------:|:----------:|:----------:|
5+
| [**`7b-hf`**](https://huggingface.co/meta-llama/CodeLlama-7b-hf) | 1x a40 | - tokens/s | - tokens/s |
6+
| [`7b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf) | 1x a40 | - tokens/s | - tokens/s |
7+
| [`13b-hf`](https://huggingface.co/meta-llama/CodeLlama-13b-hf) | 1x a40 | - tokens/s | - tokens/s |
8+
| [`13b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf) | 1x a40 | - tokens/s | - tokens/s |
9+
| [`34b-hf`](https://huggingface.co/meta-llama/CodeLlama-34b-hf) | 2x a40 | - tokens/s | - tokens/s |
10+
| [`34b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) | 2x a40 | - tokens/s | - tokens/s |
11+
| [`70b-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-hf) | 4x a40 | - tokens/s | - tokens/s |
12+
| [`70b-Instruct-hf`](https://huggingface.co/meta-llama/CodeLlama-70b-Instruct-hf) | 4x a40 | - tokens/s | - tokens/s |

models/codellama/config.sh

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
export MODEL_NAME="CodeLlama"
2+
export MODEL_VARIANT="7b-hf"
3+
export NUM_NODES=1
4+
export NUM_GPUS=1
5+
export VLLM_MAX_LOGPROBS=32000

models/llama2/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
| Variant | Suggested resource allocation |
44
|:----------:|:----------:|
5-
| [**`7b`**](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1x a40 |
6-
| [`7b-chat`](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | 1x a40 |
7-
| [`13b`](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1x a40 |
8-
| [`13b-chat`](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | 1x a40 |
9-
| [`70b`](https://huggingface.co/meta-llama/Llama-2-70b-hf) | 4x a40 |
10-
| [`70b-chat`](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4x a40 |
5+
| [**`7b-hf`**](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 1x a40 |
6+
| [`7b-chat-hf`](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) | 1x a40 |
7+
| [`13b-hf`](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 1x a40 |
8+
| [`13b-chat-hf`](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) | 1x a40 |
9+
| [`70b-hf`](https://huggingface.co/meta-llama/Llama-2-70b-hf) | 4x a40 |
10+
| [`70b-chat-hf`](https://huggingface.co/meta-llama/Llama-2-70b-chat-hf) | 4x a40 |

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "vector-inference"
3-
version = "0.2.0"
3+
version = "0.2.1"
44
description = "Efficient LLM inference on Slurm clusters using vLLM."
55
authors = ["XkunW <marshall.wang@vectorinstitute.ai>"]
66
license = "MIT license"

0 commit comments

Comments
 (0)