Thanks for your excellent work on Puppeteer! I do believe this is the correct direction for future multi-agent systems.
However, I found that after training the orchestrator for the MMLU-pro task using the provided configuration, it always produces the same topology with the same agent choice: arxiv -> concluder -> terminator. It seems to overfit to the same selection regardless of the intermediate output state, even when I explicitly change it. I think maybe the task is too simple for the agents, and the training does not really consider a complex scenario (e.g, where the intermediate agent needs help from another agent).
I really like the idea of puppeteer, and I think it is one of the only studies that considers a dynamic workflow design based on task state. Please let me know if you observed a similar pattern when you trained the model, and if so, do you have any plan to improve the orchestrator?
Thanks for your excellent work on Puppeteer! I do believe this is the correct direction for future multi-agent systems.
However, I found that after training the orchestrator for the MMLU-pro task using the provided configuration, it always produces the same topology with the same agent choice: arxiv -> concluder -> terminator. It seems to overfit to the same selection regardless of the intermediate output state, even when I explicitly change it. I think maybe the task is too simple for the agents, and the training does not really consider a complex scenario (e.g, where the intermediate agent needs help from another agent).
I really like the idea of puppeteer, and I think it is one of the only studies that considers a dynamic workflow design based on task state. Please let me know if you observed a similar pattern when you trained the model, and if so, do you have any plan to improve the orchestrator?