[ INTEL_NODE_28546 ] · PRIORITY: 8.8/10

Qwen3.6 35B A3B Uncensored “Heretic” Released: Native MTP Preservation Sets New Standard for Local LLM Performance

  PUBLISHED: · SOURCE: Reddit LocalLLaMA →
[ DATA_STREAM_START ]

The Qwen3.6 35B A3B “Heretic” uncensored variant has been released, marking a significant milestone in high-fidelity fine-tuning. By preserving all 19 native Multi-Token Prediction (MTP) modules and maintaining a minimal KLD of 0.0015, this model offers unrestricted output without compromising the architectural advantages of the Qwen base. It is now available in Safetensors, GGUF, and NVFP4 formats.

  • Architectural Fidelity: By retaining 19 native MTP modules, this version maintains the inference acceleration and structural integrity often lost in aggressive fine-tunes, ensuring peak hardware utilization.
  • Precision Alignment: A KLD of 0.0015 indicates that the model sheds safety filters without drifting from the base model’s reasoning capabilities. The refusal rate has been slashed to a mere 10/100.

Bagua Insight

The release of the “Heretic” version highlights a shifting trend in the LocalLLaMA community: moving beyond simple “uncensoring” toward sophisticated “architectural preservation.” MTP is a cornerstone of the Qwen architecture’s efficiency, typically broken during standard fine-tuning. Preserving it while achieving such low KL Divergence suggests a masterclass in weight delta management. This release proves that high-performance inference and unrestricted, high-entropy output are no longer mutually exclusive in the 35B parameter class.

Actionable Advice

Deployment teams should prioritize the NVFP4 and GGUF formats to maximize throughput on consumer-grade hardware. For workflows requiring complex instruction following or creative generation where standard alignment typically triggers refusals, this 35B variant offers the best performance-to-size ratio currently available. Developers should benchmark the MTP-enabled inference speeds against standard fine-tunes to quantify the latency gains in production environments.

[ DATA_STREAM_END ]
[ ORIGINAL_SOURCE ]
READ_ORIGINAL →
[ 02 ] RELATED_INTEL