[ PROMPT_NODE_22438 ]

llms-txt

[ SKILL_DOCUMENTATION ]

# Unsloth - Llms-Txt **页面数:** 136 --- ## !pip install huggingface_hub hf_transfer **URL:** llms-txt#!pip-install-huggingface_hub-hf_transfer import os os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1" from huggingface_hub import snapshot_download snapshot_download( repo_id = "unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF", local_dir = "unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF", allow_patterns = ["*IQ2_XXS*"], ) bash ./llama.cpp/llama-cli --model unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF/Llama-4-Scout-17B-16E-Instruct-UD-IQ2_XXS.gguf --threads 32 --ctx-size 16384 --n-gpu-layers 99 -ot ".ffn_.*_exps.=CPU" --seed 3407 --prio 3 --temp 0.6 --min-p 0.01 --top-p 0.9 -no-cnv --prompt "usernnCreate a Flappy Bird game.assistantnn" {% hint style="success" %} 在此处阅读更多关于运行 Llama 4 的信息: {% endhint %} **示例:** 示例 1 (未知): unknown And and let's do inference! {% code overflow="wrap" %} --- ## 首先卸载之前库安装的 xformers **URL:** llms-txt#first-uninstall-xformers-installed-by-previous-libraries pip uninstall xformers -y --- ## (1) 保存为 GGUF / 合并为 16bit 以供 vLLM 使用 **URL:** llms-txt#(1)-saving-to-gguf-/-merging-to-16bit-for-vllm --- ## Qwen3-Coder: 如何在本地运行 **URL:** llms-txt#qwen3-coder:-how-to-run-locally **内容:** - 🖥️ **运行 Qwen3-Coder** - :gear: 推荐设置 - 运行 Qwen3-Coder-30B-A3B-Instruct: 使用 Unsloth 动态量化在本地运行 Qwen3-Coder-30B-A3B-Instruct 和 480B-A35B。 Qwen3-Coder 是 Qwen 的新系列编码智能体模型，提供 30B (**Qwen3-Coder-Flash**) 和 480B 参数版本。**Qwen3-480B-A35B-Instruct** 在 Aider Polygot 上达到 61.8% 的得分，支持 256K（可扩展至 1M）上下文窗口，编码性能媲美 Claude Sonnet-4、GPT-4.1 和 [Kimi K2](https://docs.unsloth.ai/models/tutorials-how-to-fine-tune-and-run-llms/kimi-k2-how-to-run-locally)。我们还上传了通过 YaRN 扩展至 **1M 上下文长度** 的 Qwen3-Coder，以及全精度 8bit 和 16bit 版本。[Unsloth](https://github.com/unslothai/unsloth) 现在也支持 Qwen3-Coder 的微调和 [强化学习 (RL)](https://docs.unsloth.ai/get-started/reinforcement-learning-rl-guide)。 {%

数据来源：claude-code-templates（MIT），中文翻译由 AI 生成。详见关于我们。

BAGUA AI