[ PROMPT_NODE_22430 ]

Fine Tuning PEFT 故障排查

[ SKILL_DOCUMENTATION ]

# PEFT 故障排查指南 ## 安装问题 ### bitsandbytes CUDA 错误 **错误**: `CUDA Setup failed despite GPU being available` **修复**: bash # 检查 CUDA 版本 nvcc --version # 安装匹配的 bitsandbytes pip uninstall bitsandbytes pip install bitsandbytes --no-cache-dir # 或从源码编译特定 CUDA 版本 git clone https://github.com/TimDettmers/bitsandbytes.git cd bitsandbytes CUDA_VERSION=118 make cuda11x # 根据您的 CUDA 版本调整 pip install . ### Triton 导入错误 **错误**: `ModuleNotFoundError: No module named 'triton'` **修复**: bash # 安装 triton (仅限 Linux) pip install triton # Windows: 不支持 Triton，请使用 CUDA 后端 # 设置环境变量禁用 triton export CUDA_VISIBLE_DEVICES=0 ### PEFT 版本冲突 **错误**: `AttributeError: 'LoraConfig' object has no attribute 'use_dora'` **修复**: bash # 升级到最新 PEFT pip install peft>=0.13.0 --upgrade # 检查版本 python -c "import peft; print(peft.__version__)" ## 训练问题 ### CUDA 显存溢出 (OOM) **错误**: `torch.cuda.OutOfMemoryError: CUDA out of memory` **解决方案**: 1. **启用梯度检查点 (Gradient Checkpointing)**: python from peft import prepare_model_for_kbit_training model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True) 2. **减小批大小 (Batch Size)**: python TrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=16 # 保持有效批大小 ) 3. **使用 QLoRA**: python from transformers import BitsAndBytesConfig bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True ) model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config) 4. **降低 LoRA 秩**: python LoraConfig(r=8) # 代替 r=16 或更高 5. **减少目标模块**: python target_modules=["q_proj", "v_proj"] # 代替 all-linear ### 损失不下降 **问题**: 训练损失保持平稳或上升。 **解决方案**: 1. **检查学习率**: python # 调低学习率 TrainingArguments(learning_rate=1e-4) # 不要使用 2e-4 或更高 2. **验证适配器是否激活**: python model.print_trainable_parameters() # 应显示 >0 个可训练参数 # 检查适配器是否已应用 print(model.peft_config) 3. **检查数据格式**: python # 验证分词 sample = dataset[0] decoded = tokenizer.decode(sample["input_ids"]) print(decoded) # Sh

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI