Could you please share the evaluation scripts and prompts that were used to generate the reported results in the paper?
Various parameters are involved in generating outputs, and it is crucial to get these prompts correct, as large language models (LLMs) are highly sensitive to even minor changes in input.
Having access to these scripts and prompts would be invaluable for replicating the experiments accurately and exploring different variations in the evaluation process. This would enable a more precise fine-tuning of models and methodologies, leading to a deeper understanding and potentially novel insights.
Could you please share the evaluation scripts and prompts that were used to generate the reported results in the paper?
Various parameters are involved in generating outputs, and it is crucial to get these prompts correct, as large language models (LLMs) are highly sensitive to even minor changes in input.
Having access to these scripts and prompts would be invaluable for replicating the experiments accurately and exploring different variations in the evaluation process. This would enable a more precise fine-tuning of models and methodologies, leading to a deeper understanding and potentially novel insights.