记一次错误的发现logits and labels must have the same first dimension

今天写代码的时候发现一次错误这里的模型内容定义如下：from nezha import Bertbatch_size = 48bertmodel = Bert(maxlen=max_seq_len,batch_size=batch_size,**config)input_ids1 = keras.layers.Input(shape=(None,),dtype='int32',name="toke

唐僧爱吃唐僧肉

3783人浏览 · 2021-07-20 22:36:44

唐僧爱吃唐僧肉 · 2021-07-20 22:36:44 发布

今天写代码的时候发现一次错误

这里的模型内容定义如下：

from nezha import Bert
batch_size = 48
bertmodel = Bert(maxlen=max_seq_len,batch_size=batch_size,**config)
input_ids1 = keras.layers.Input(shape=(None,),dtype='int32',name="token_ids1")
input_ids2 = keras.layers.Input(shape=(None,),dtype='int32',name="segment_ids1")
output1 = bertmodel([input_ids1,input_ids2])
output1 = keras.layers.Lambda(lambda seq: seq[:,0,:])(output1)
output1 = keras.layers.Dropout(0.2)(output1)
output1 = keras.layers.Dense(units=768,activation="tanh")(output1)
output1 = keras.layers.Dropout(0.2)(output1)
output1 = keras.layers.Dense(units=3,activation="softmax")(output1)
output2 = bertmodel([input_ids1,input_ids2])
output2 = keras.layers.Lambda(lambda seq: seq[:,0,:])(output2)
output2 = keras.layers.Dropout(0.2)(output2)
output2 = keras.layers.Dense(units=768,activation="tanh")(output2)
output2 = keras.layers.Dropout(0.2)(output2)
output2 = keras.layers.Dense(units=3,activation="softmax")(output2)
model = keras.Model(inputs=[input_ids1,input_ids2],outputs=[output1,output2])
model.summary()

损失函数的内容定义如下：

def crossentropy_with_rdrop(y_true, y_pred, alpha=4):
    """配合R-Drop的交叉熵损失
    """
    y_pred1 = y_pred[0]
    loss = K.sparse_categorical_crossentropy(y_true,y_pred1)
    return loss

结果这里的内容报错

logits and labels must have the same first dimension, got logits shape [1,3] and labels shape [48]
	 [[node crossentropy_with_rdrop_1/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at <ipython-input-14-0974e68970c0>:6) ]]
	 [[Adam/gradients/concat_2/_1278]]
  (1) Invalid argument:  logits and labels must have the same first dimension, got logits shape [1,3] and labels shape [48]

错误原因在于不知道为什么，这里的outputs定义为

outputs = [output1,output2]

但是这里输出的outputs实际上还只有output1而没有output2，因此这里调用

y_pred1 = y_pred[0]

之后，实际上取出来的是第一个数组对应的内容，相当于这里的y_pred = (48,3)，然后取出来y_pred[0]之后y_pred1 = (1,3)，因此这里再接下来调用

loss = K.sparse_categorical_crossentropy(y_true,y_pred1)

这里会报错，因为y_true = (48,1)不能跟y_pred1 = (1,3)相匹配上，所以这里调用K.sparse_categorical_crossentropy损失函数的时候会发生相应的报错
但是奇怪的是，如果使用模型直接进行预测的时候

current_results = model([np.array(batch_token_ids),np.array(batch_segment_ids)])

得到的current_results却是对应的两个list数组

~~~current_results = ~~~
[<tf.Tensor: shape=(48, 3), dtype=float32, numpy=
array([[0.22348379, 0.56881523, 0.20770097],
       [0.24022636, 0.53234166, 0.22743203],
       [0.25963855, 0.53202915, 0.20833229],
       [0.24430801, 0.5581201 , 0.1975719 ],
       [0.22885586, 0.5378479 , 0.23329626],
       [0.23131241, 0.54680216, 0.22188547],
       [0.24941131, 0.5467395 , 0.20384923],
       [0.2281745 , 0.5484533 , 0.22337219],
       [0.2479245 , 0.53815055, 0.2139249 ],
       [0.25504684, 0.5443265 , 0.20062669],
       [0.25637898, 0.52441853, 0.2192025 ],
       [0.24482411, 0.542099  , 0.21307687],
       [0.27194723, 0.53812975, 0.18992302],
       [0.23809686, 0.5681701 , 0.19373308],
       [0.24254994, 0.5070763 , 0.25037372],
       [0.25669464, 0.5467163 , 0.19658907],
       [0.25775704, 0.53852123, 0.20372164],
       [0.2384669 , 0.49613994, 0.2653931 ],
       [0.25791478, 0.5394374 , 0.20264779],
       [0.26459333, 0.53890723, 0.1964994 ],
       [0.24296562, 0.5577929 , 0.19924152],
       [0.25157478, 0.5200511 , 0.22837408],
       [0.2633939 , 0.5526742 , 0.18393186],
       [0.25570095, 0.5487229 , 0.19557613],
       [0.2330663 , 0.56324214, 0.20369154],
       [0.25796923, 0.5566103 , 0.18542047],
       [0.23153353, 0.56034464, 0.20812184],
       [0.24306743, 0.551874  , 0.20505862],
       [0.21740852, 0.55769265, 0.22489879],
       [0.26569477, 0.55175453, 0.18255074],
       [0.26401183, 0.5580658 , 0.17792237],
       [0.265182  , 0.5257538 , 0.20906426],
       [0.25043765, 0.5523196 , 0.19724278],
       [0.26271033, 0.5353176 , 0.20197207],
       [0.25589937, 0.53950065, 0.20460002],
       [0.25624436, 0.5359671 , 0.20778848],
       [0.25433272, 0.5308562 , 0.21481112],
       [0.22949585, 0.5428018 , 0.22770236],
       [0.26136097, 0.55955845, 0.17908058],
       [0.24228041, 0.5394519 , 0.21826772],
       [0.24138369, 0.5557943 , 0.20282204],
       [0.25254685, 0.5515779 , 0.19587524],
       [0.27394438, 0.522566  , 0.20348959],
       [0.26305544, 0.54159886, 0.19534567],
       [0.27065644, 0.53004247, 0.19930108],
       [0.24923316, 0.55526716, 0.1954997 ],
       [0.25400022, 0.5450946 , 0.20090513],
       [0.24974696, 0.53991544, 0.21033762]], dtype=float32)>, 
<tf.Tensor: shape=(48, 3), dtype=float32, numpy=
array([[0.29300308, 0.5401257 , 0.16687119],
       [0.3298691 , 0.5311124 , 0.1390186 ],
       [0.3873792 , 0.48128107, 0.13133973],
       [0.34917748, 0.48370904, 0.1671135 ],
       [0.27951905, 0.5489006 , 0.17158031],
       [0.3439213 , 0.50909376, 0.14698489],
       [0.38596642, 0.48221216, 0.13182144],
       [0.29546183, 0.5401946 , 0.16434367],
       [0.34651926, 0.5181145 , 0.1353662 ],
       [0.43990678, 0.41554394, 0.14454924],
       [0.37591207, 0.48171023, 0.14237764],
       [0.3223162 , 0.5318312 , 0.14585261],
       [0.45775193, 0.40278044, 0.13946763],
       [0.32722893, 0.5107374 , 0.16203372],
       [0.29620585, 0.52684844, 0.17694569],
       [0.43269792, 0.42045838, 0.14684375],
       [0.385755  , 0.49197266, 0.12227238],
       [0.26434082, 0.52908283, 0.20657633],
       [0.39530444, 0.46878868, 0.13590695],
       [0.44480005, 0.4267176 , 0.12848231],
       [0.360057  , 0.48701972, 0.15292326],
       [0.3362542 , 0.5224165 , 0.14132933],
       [0.4037658 , 0.45516342, 0.14107078],
       [0.40773708, 0.45386332, 0.13839953],
       [0.33756983, 0.5143009 , 0.14812933],
       [0.4297622 , 0.43651512, 0.13372268],
       [0.3273759 , 0.51048106, 0.16214305],
       [0.32283273, 0.51227605, 0.16489123],
       [0.25585884, 0.55587214, 0.18826896],
       [0.39580822, 0.4599473 , 0.14424445],
       [0.3881661 , 0.46235946, 0.14947443],
       [0.4435509 , 0.4265573 , 0.12989175],
       [0.36100864, 0.49833503, 0.1406563 ],
       [0.40100974, 0.438785  , 0.16020529],
       [0.42775044, 0.44003108, 0.13221851],
       [0.37748566, 0.47884828, 0.14366603],
       [0.3617967 , 0.5031131 , 0.1350902 ],
       [0.30262244, 0.54520833, 0.15216923],
       [0.4076588 , 0.4317662 , 0.16057503],
       [0.33263358, 0.50482607, 0.16254033],
       [0.34378305, 0.5014613 , 0.15475567],
       [0.37794736, 0.48685223, 0.13520035],
       [0.46228144, 0.41022223, 0.12749635],
       [0.4010705 , 0.4672875 , 0.13164194],
       [0.44229177, 0.4237102 , 0.13399802],
       [0.3427251 , 0.49487054, 0.16240436],
       [0.34575745, 0.5135948 , 0.14064772],
       [0.33513075, 0.52731526, 0.137554  ]], dtype=float32)>]

原因在于
我们编译我们的模型，并且给平滑损失一个0.2的权重。可以用列表或者字典定义不同输出对应损失权重，如果对loss传入一个数，则损失权重会被用于全部的输出。

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

也就是说，如果有多个输出，但是只有一个损失函数，则该损失函数分别作用于output1,output2
如果我们的损失函数之中，output1,output2需要相互作用尼？
思路一：截取
最后一层的时候

x = keras.layers.concatenate([lstm_out, auxiliary_input])

完成之后计算损失函数内容

def crossentropy_with_rdrop(y_true, y_pred, alpha=4):
    """配合R-Drop的交叉熵损失
    """
    y_pred1 = y_pred[:,0:3]
    y_pred2 = y_pred[:,4:6]
    ........................
    return loss

AtomGit 开源协作平台测评赛

瓜分20万奖金获得内推名额丰厚实物奖励易参与易上手

更多推荐

用selenium4 webdriver + java 开发第一个自动化测试脚本

开放原子开发者工作坊

IDEA 2021免费下载（附安装教程）

开放原子开发者工作坊

密码安全之密码技术【1】置换密码

密码学是一门研究设计密码算法和破译密码算法的综合性技术科学，是网络空间安全学科中理论体系最完善的一门科学，也是信息安全的基石。密码学通常由密码编码和密码分析两大分支组成。从密码学的发展历史来看，可以分为古典密码学和现代密码学。古典加密算法往往只是对单个的代替或置换操作，现代密码学就是寻求基于简单运算来构造复杂算法的数学方法，形成安全性较高的加密算法，重复混合应用置换和替代运算，实现对明文的扩散和.