Transformer 自注意力机制图

中文完整提示词

一张 NLP 会议论文风格的 Transformer 缩放点积自注意力机制图。左侧六个词元嵌入竖条（「The」/「cat」/「sat」/「on」/「the」/「mat」，深蓝到金色渐变色谱），各词元分叉三条线性投影箭头到 Q（钴蓝）、K（青绿）、V（珊瑚）矩阵网格。中央 6×6 注意力分数矩阵（白到深海军蓝渐变填充），标注 "Attention Scores QKᵀ / √dₖ"；Softmax 小柱状图标；输出为 V 向量加权和，右侧渐变色竖条表示输出词元表示。Q 到每个 K 的箭头线宽反映注意力权重。右上角 Encoder 块结构小插图（Add & Norm、Multi-Head Attention、Feed Forward 层叠）。白色背景，学术论文版式，9pt 标注。

English full prompt

A detailed academic diagram illustrating the scaled dot-product self-attention mechanism in a Transformer model, styled for an NLP conference proceedings paper. Left side: a sequence of 6 token embeddings shown as vertical colour-coded rectangles ("The", "cat", "sat", "on", "the", "mat" — coloured from deep blue to gold in a spectrum), each 40 × 200 px. Three linear projection arrows branch from each token rectangle to Q (query), K (key), and V (value) matrices, depicted as three stacked grids of cells in cobalt, teal, and coral respectively, all labelled. A dot-product connection matrix in the centre (6 × 6 grid with varying cell fill intensity from white to deep navy) is labelled "Attention Scores QKᵀ / √dₖ". A softmax normalisation step is shown as a small bar-chart icon, then the output is computed as weighted sum of V vectors. Final output token representations are shown as gradient-filled vertical bars at the right. Arrows connecting every Q to every K with varying line widths indicating attention weight. An inset at top-right shows the full Encoder block structure (Add & Norm, Multi-Head Attention, Feed Forward) as a compact stack diagram. White background, academic-paper layout, 9 pt labels.

Transformer 自注意力机制图

中文完整提示词

English full prompt

相关案例