Qwen3.6 35B A3B Uncensored “Heretic” Released: Native MTP Preservation Sets New Standard for Local LLM Performance

● PUBLISHED: 2026 5 9 · SOURCE: Reddit LocalLLaMA →

[ DATA_STREAM_START ]

The Qwen3.6 35B A3B “Heretic” uncensored variant has been released, marking a significant milestone in high-fidelity fine-tuning. By preserving all 19 native Multi-Token Prediction (MTP) modules and maintaining a minimal KLD of 0.0015, this model offers unrestricted output without compromising the architectural advantages of the Qwen base. It is now available in Safetensors, GGUF, and NVFP4 formats.

▶ Architectural Fidelity: By retaining 19 native MTP modules, this version maintains the inference acceleration and structural integrity often lost in aggressive fine-tunes, ensuring peak hardware utilization.
▶ Precision Alignment: A KLD of 0.0015 indicates that the model sheds safety filters without drifting from the base model’s reasoning capabilities. The refusal rate has been slashed to a mere 10/100.

Bagua Insight

The release of the “Heretic” version highlights a shifting trend in the LocalLLaMA community: moving beyond simple “uncensoring” toward sophisticated “architectural preservation.” MTP is a cornerstone of the Qwen architecture’s efficiency, typically broken during standard fine-tuning. Preserving it while achieving such low KL Divergence suggests a masterclass in weight delta management. This release proves that high-performance inference and unrestricted, high-entropy output are no longer mutually exclusive in the 35B parameter class.

Actionable Advice

Deployment teams should prioritize the NVFP4 and GGUF formats to maximize throughput on consumer-grade hardware. For workflows requiring complex instruction following or creative generation where standard alignment typically triggers refusals, this 35B variant offers the best performance-to-size ratio currently available. Developers should benchmark the MTP-enabled inference speeds against standard fine-tunes to quantify the latency gains in production environments.

[ DATA_STREAM_END ]

[ ORIGINAL_SOURCE ]

READ_ORIGINAL →

[ 02 ] RELATED_INTEL

2026 5 15

Deconstructing Claude Code: How Anthropic Reinvents Agentic Workflows for Massive Codebases

Core Summary Claude Code is a specialized CLI-based agentic tool designed to navigate, interpret, and refactor massive codebases by leveraging…

2026 5 15

Intern-S2-Preview Launch: 35B Model Redefines Scientific AI via ‘Task Scaling’

Core Summary The InternLM team has unveiled Intern-S2-Preview, a 35B-parameter scientific multimodal foundation model. Moving beyond traditional parameter and data…

2026 6 5

Structural Pruning: Lowfat Slashes LLM Token Usage by 90% via Tree-sitter Filtering