adding trust_remote_code and batch_size #21

abdelkareemkobo · 2025-05-04T13:22:19Z

I added trust_remote_code to the be default with True Also added the batch_size argument to be 32 as default option

Pringled

Thanks for making this PR! Almost good to go. Could you "run pre-commit run --all-files" in the virtual environment (after running make install)? That should automatically fix the styling issues.

Pringled · 2025-05-04T14:05:21Z

tokenlearn/train.py

+    parser.add_argument(
+        "--trust-remote-code",
+        type=bool,
+        default=True,


I think False is a safer default to choose here, could you update to that?

Pringled · 2025-05-04T14:06:31Z

tokenlearn/train.py

@@ -18,7 +18,7 @@


 def train_model(
-    model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu", vocab_size: int | None = None
+    model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu",batch_size: int= 32, vocab_size: int | None = None,trust_remote_code: bool=True


Small styling issue:

Suggested change

model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu",batch_size: int= 32, vocab_size: int | None = None,trust_remote_code: bool=True

model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu", batch_size: int= 32, vocab_size: int | None = None, trust_remote_code: bool=True

Also on a couple of other lines, see general PR comment.

Pringled · 2025-05-04T14:17:44Z

tokenlearn/train.py

+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=32,


This default doesn't match the default in https://github.com/MinishLab/tokenlearn/blob/main/tokenlearn/pretrain.py#L127, could you change to 256 so they match? Same for the default in the train_model function

adding trust_remote_code and batch_size

bf3d583

I added trust_remote_code to the be default with True Also added the batch_size argument to be 32 as default option

Pringled requested changes May 4, 2025

View reviewed changes

Pringled reviewed May 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding trust_remote_code and batch_size #21

adding trust_remote_code and batch_size #21

abdelkareemkobo commented May 4, 2025

Pringled left a comment

Pringled May 4, 2025

Pringled May 4, 2025

Pringled May 4, 2025

Pringled May 4, 2025

	model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu",batch_size: int= 32, vocab_size: int \| None = None,trust_remote_code: bool=True
	model_name: str, train_txt: list[str], train_vec: np.ndarray, device: str = "cpu", batch_size: int= 32, vocab_size: int \| None = None, trust_remote_code: bool=True

adding trust_remote_code and batch_size #21

Are you sure you want to change the base?

adding trust_remote_code and batch_size #21

Conversation

abdelkareemkobo commented May 4, 2025

Pringled left a comment

Choose a reason for hiding this comment

Pringled May 4, 2025

Choose a reason for hiding this comment

Pringled May 4, 2025

Choose a reason for hiding this comment

Pringled May 4, 2025

Choose a reason for hiding this comment

Pringled May 4, 2025

Choose a reason for hiding this comment