Deep Note: agent/example/kernels/a2/flash_attn_full_pj_hif8.py

【免费下载链接】cannbot-skills CANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。 【免费下载链接】cannbot-skills 项目地址: https://gitcode.com/cann/cannbot-skills

Open this file only after the short catalog entry confirmed the kernel is relevant.

What this kernel is really for

  • the scaled-hif8 probability variant of normalized a2 online softmax
  • a contract that intentionally changes the delayed value path while keeping row_sum in float
  • a kernel that exports final rowmax / rowsum as part of the visible contract

Decisions worth copying

  • update row_sum from the float p_j tile before any half / hif8 cast
  • keep stage-1 score scratch and stage-2 pv scratch separate in the readable baseline
  • implement the non-negative hif8 simulation without relying on unsupported uint8 -> float shortcuts
  • copy [M,64] score slices into contiguous scratch before reinterpret(...) when the quantized helper needs contiguous lanes
  • handle non-aligned S2 in score space with suffix invalidation and a sufficiently negative finite sentinel
  • handle non-aligned S1 separately from S2; invalid rows should become zero contribution to delayed p @ v while GM still writes only valid rows

Prefer another kernel when

  • you still want the plain p.half().float() value path
  • you are debugging the normalized float/half baseline before introducing hif8 behavior

【免费下载链接】cannbot-skills CANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。 【免费下载链接】cannbot-skills 项目地址: https://gitcode.com/cann/cannbot-skills

Logo

小龙虾开发者社区是 CSDN 旗下专注 OpenClaw 生态的官方阵地,聚焦技能开发、插件实践与部署教程,为开发者提供可直接落地的方案、工具与交流平台,助力高效构建与落地 AI 应用

更多推荐