* add repitation early stop cases * add repitation early stop cases * add bad cases * add bad cases * add evil cases * add benchmark gsm8k