It’s an open source model, so surely there should be some training code online. But it turns out there isn’t really any. LLaMA-Factory + KTransformers is supposed to support it, but I encountered a bunch of bugs. Also, it’s designed for CPU offloading + GPU training, which adds unnecessary complexity and is inefficient.
Luckily the Europeans have been at it yet again. The International
。新收录的资料对此有专业解读
Continue reading...,推荐阅读新收录的资料获取更多信息
Blank token ID is 1024 (110M) or 8192 (600M)