正在加载图片...
DMNet for Semi-supervised Segmentation 535 it can learn better features from the difference between soft masks generated by two decoders,which can lead to better performance.This will be verified by our experimental results in Sect.4.The two decoders in DMNet use different architectures to introduce diversity.By adopting different architectures,the two decoders will not typically output exactly the same segmentation masks and they can learn from each other.By using labeled and unlabeled data in turn,DMNet can utilize unlabeled data adequately to improve segmentation performance. DMNet is a general framework,and any segmentation network with an encoder- decoder architecture,such as UNet [16],VNet [13],SegNet [1]and DeepLab v3+[7],can be used in DMNet.In this paper,we adopt UNet [16]and DeepLab v3+[7]for illustration.The shared encoder can extract latent representation with high-level semantic information of the input image.Then we use the ground truth to supervise the learning of segmentation network for labeled data while minimizing the difference between the masks generated by the two decoders to let them learn from each other for unlabeled data. We use Dice loss [13 to train our segmentation network on labeled data, which is defined as follows: Ldice(Y),Y②),Y;0) 2∑从1W1五wk9k where Yh.w,k=1 when the pixel at position (h,w)belongs to class k,and other values inis set to be.is the probability that the pixel at position (h,w)belongs to class k predicted by the segmentation branch i.0s is the parameter of the segmentation network. The loss function used for unlabeled data is described in Sect.3.3. 3.2 Sharpen Operation Given an unlabeled data U,our segmentation network can generate soft masks Y(1)and Y(2).To make the predictions of the segmentation networks have low entropy or high confidence,we adopt the sharpen operation [3 to reduce the entropy of predictions on unlabeled data,which is defined as follows: Sharpen(Y0e,I))= (8cT ∑K1(Y9)/T h∈[1:H],w∈[1:W],T∈(0,1) where Y(i)is the soft mask predicted by decoder branch i and temperature T is a hyperparameter. 3.3 Difference Minimization for Semi-supervised Segmentation As described in Sect.3.1,two decoders can generate two masks on unlabeled data.If the two masks vary from each other,it means the model is unsure about the predictions and thus the model cannot generalize well.Therefore,DMNet for Semi-supervised Segmentation 535 it can learn better features from the difference between soft masks generated by two decoders, which can lead to better performance. This will be verified by our experimental results in Sect. 4. The two decoders in DMNet use different architectures to introduce diversity. By adopting different architectures, the two decoders will not typically output exactly the same segmentation masks and they can learn from each other. By using labeled and unlabeled data in turn, DMNet can utilize unlabeled data adequately to improve segmentation performance. DMNet is a general framework, and any segmentation network with an encoder￾decoder architecture, such as UNet [16], VNet [13], SegNet [1] and DeepLab v3+ [7], can be used in DMNet. In this paper, we adopt UNet [16] and DeepLab v3+ [7] for illustration. The shared encoder can extract latent representation with high-level semantic information of the input image. Then we use the ground truth to supervise the learning of segmentation network for labeled data while minimizing the difference between the masks generated by the two decoders to let them learn from each other for unlabeled data. We use Dice loss [13] to train our segmentation network on labeled data, which is defined as follows: Ldice(Yˆ (1),Yˆ (2),Y ; θs) = 2 i=1 1 − 1 K K k=1 2 H h=1 W w=1 Yh,w,kYˆ (i) h,w,k H h=1 W w=1(Yh,w,k + Yˆ (i) h,w,k)  , where Yh,w,k = 1 when the pixel at position (h, w) belongs to class k, and other values in Yh,w,k is set to be 0. Yˆ (i) h,w,k is the probability that the pixel at position (h, w) belongs to class k predicted by the segmentation branch i. θs is the parameter of the segmentation network. The loss function used for unlabeled data is described in Sect. 3.3. 3.2 Sharpen Operation Given an unlabeled data U, our segmentation network can generate soft masks Yˆ (1) and Yˆ (2). To make the predictions of the segmentation networks have low entropy or high confidence, we adopt the sharpen operation [3] to reduce the entropy of predictions on unlabeled data, which is defined as follows: Sharpen(Yˆ (i) h,w,c, T) = (Yˆ (i) h,w,c)1/T K i=1 (Yˆ (i) h,w,i)1/T ∀h ∈ [1 : H], w ∈ [1 : W], T ∈ (0, 1), where Yˆ (i) is the soft mask predicted by decoder branch i and temperature T is a hyperparameter. 3.3 Difference Minimization for Semi-supervised Segmentation As described in Sect. 3.1, two decoders can generate two masks on unlabeled data. If the two masks vary from each other, it means the model is unsure about the predictions and thus the model cannot generalize well. Therefore,
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有