[ PROMPT_NODE_22860 ]

simpo-training

[ SKILL_DOCUMENTATION ]

# SimPO - 简单偏好优化 ## 快速开始 SimPO 是一种无需参考模型的偏好优化方法，其性能优于 DPO 且无需参考模型。 **安装**： bash # 创建环境 conda create -n simpo python=3.10 && conda activate simpo # 安装 PyTorch 2.2.2 # 访问: https://pytorch.org/get-started/locally/ # 安装 alignment-handbook git clone https://github.com/huggingface/alignment-handbook.git cd alignment-handbook python -m pip install . # 安装 Flash Attention 2 python -m pip install flash-attn --no-build-isolation **训练** (Mistral 7B)： bash ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simpo.py training_configs/mistral-7b-base-simpo.yaml ## 常见工作流 ### 工作流 1：从基础模型训练 (Mistral 7B) **配置** (`mistral-7b-base-simpo.yaml`)： yaml # 模型 model_name_or_path: mistralai/Mistral-7B-v0.1 torch_dtype: bfloat16 # 数据集 dataset_mixer: HuggingFaceH4/ultrafeedback_binarized: 1.0 dataset_splits: - train_prefs - test_prefs # SimPO 超参数 beta: 2.0 # 奖励缩放 (2.0-10.0) gamma_beta_ratio: 0.5 # 目标边际 (0-1) loss_type: sigmoid # sigmoid 或 hinge sft_weight: 0.0 # 可选的 SFT 正则化 # 训练 learning_rate: 5e-7 # 关键: 3e-7 到 1e-6 num_train_epochs: 1 per_device_train_batch_size: 1 gradient_accumulation_steps: 8 # 输出 output_dir: ./outputs/mistral-7b-simpo **启动训练**： bash accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simpo.py training_configs/mistral-7b-base-simpo.yaml ### 工作流 2：微调指令模型 (Llama 3 8B) **配置** (`llama3-8b-instruct-simpo.yaml`)： yaml model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct dataset_mixer: argilla/ultrafeedback-binarized-preferences-cleaned: 1.0 beta: 2.5 gamma_beta_ratio: 0.5 learning_rate: 5e-7 sft_weight: 0.1 # 添加 SFT 损失以保持能力 num_train_epochs: 1 per_device_train_batch_size: 2 gradient_accumulation_steps: 4 output_dir: ./outputs/llama3-8b-simpo **启动**： bash accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simpo.py training_configs/llama3-8b-instruct-simpo.yaml ### 工作流 3：推理密集型任务 (较低的学习率) **针对数学/代码任务**： yaml model_name_or_path: deepseek-ai/d

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI