The Qwen3.6 35B A3B "Heretic" uncensored variant has been released, marking a significant milestone in high-fidelity fine-tuning. By preserving all 19 native Multi-Token Prediction (MTP) modules and maintaining a minimal KLD of 0.0015, this model offers unrestricted output without compromising the architectural advantages of the Qwen base. It is now available in Safetensors, GGUF, and NVFP4 formats.
▶ Architectural Fidelity: By retaining 19 native MTP modules, this version maintains the inference acceleration and structural integrity often lost in aggressive fine-tunes, ensuring peak hardware utilization.
▶ Precision Alignment: A KLD of 0.0015 indicates that the model sheds safety filters without drifting from the base model's reasoning capabilities. The refusal rate has been slashed to a mere 10/100.
Bagua Insight
The release of the "Heretic" version highlights a shifting trend in the LocalLLaMA community: moving beyond simple "uncensoring" toward sophisticated "architectural preservation." MTP is a cornerstone of the Qwen architecture's efficiency, typically broken during standard fine-tuning. Preserving it while achieving such low KL Divergence suggests a masterclass in weight delta management. This release proves that high-performance inference and unrestricted, high-entropy output are no longer mutually exclusive in the 35B parameter class.
Actionable Advice
Deployment teams should prioritize the NVFP4 and GGUF formats to maximize throughput on consumer-grade hardware. For workflows requiring complex instruction following or creative generation where standard alignment typically triggers refusals, this 35B variant offers the best performance-to-size ratio currently available. Developers should benchmark the MTP-enabled inference speeds against standard fine-tunes to quantify the latency gains in production environments.
SOURCE: REDDIT LOCALLLAMA // UPLINK_STABLE