正在加载图片...
SCIENCE ADVANCES RESEARCH ARTICLE the expression-selective units in these two dcnns would show a nition ex ect recog- he D k as G-Face.A IDCNNS was trained to classify 1000 obiect categ ries using natural obiect in the pretrained VGG-16and untrained ctirclatredisal ace preserve VCC-16-P-000 nVGG-F aceVGG-Fa weights(Xavier normal initialization)(4),and had no training 0.0:Pretr VG-1verVGG-Fa060 experience -Whitney Utest)(Fig.3F).C nte ad toth ained VGG-16:P=0.002 of the nits the were racted Then.the same pretraine selective units in these two DCNNs:835 (0.3%)and 644 (064%)of the face identit ition whic 2 total units were found to be expression-selective in the was domain specific,helped expres ssion-selective nits in the pCN and unt ,respectively.It seemed to achiev tegorical pe stimuli sets including th scrambled(fig.S3B),contrast-negated(fig.S3C),and inverted (fig SD)version of the) VGG-16 ore the pe vG-Fry 21.60%:95%C1,20.44t022.79% the expression of the units.For ample,the inverte ce retains all low-level features of the upright ained VGG-16 (P<0001.Mann-Whitney test)and stimulus sets.As shown in table S1 and fig $3(E toG),for all the cy o n accu G P These result chance level (all accuracies:20.20%:chance level:16.67%).There in the SVo DISCUSSION .the The purposof study was to evaluate whether the spon connbe them tan erged human-like expression-selective units in D units al VGG-Face:Kendall's=0.02.P=0.872).Collectively,only the ionecictBsyiherCcomparingthcpretainedvcG-facg -selective units in the pret VGG-Fac e pre VGG-Face,we found that,althou the doma formance of ex on classification differed.The classification experience was necessary for a DCNN to develop the human-like y of the pression-selective units in the pretr ined VGG-Face n-selective units in the pretrained vGg-16 and M untrained VGG-Face showed no similarity to human expres pretrained VGG-Face showed apparent human-like expression ed that they may not perce Zhou eral,Sci.Adv.8,eabj4383 (202) 23March 2022Zhou et al., Sci. Adv. 8, eabj4383 (2022) 23 March 2022 SCIENCE ADVANCES | RESEARCH ARTICLE 6 of 11 expression perception depended on the domain-specific experience (e.g., face-related visual experience), a general natural object recog￾nition experience, or even only the architecture of the DCNN. To address this question, we introduced two additional DCNNs: VGG-16 and untrained VGG-Face. The architecture of the VGG-16 is almost identical to the pretrained VGG-Face except that the last FC layer includes 1000 units rather than 2622 units. The VGG-16 was trained to classify 1000 object categories using natural object images from ImageNet (41); thus, it only had object-related visual experience. The untrained VGG-Face preserved the identical archi￾tecture of the VGG-Face while randomly assigning the connective weights (Xavier normal initialization) (18, 42), and had no training experience. Images from stimulus set 1 were also presented to the pretrained VGG-16 and untrained VGG-Face, respectively, and the responses of the units in the conv5-3 layer were extracted. Then, the same two-way nonrepeated ANOVA was performed to detect the expression￾selective units in these two DCNNs: 835 (0.83%) and 644 (0.64%) of the 100,352 total units were found to be expression-selective in the pretrained VGG-16 and untrained VGG-Face, respectively. It seemed that expression-selective units also spontaneously emerged in the pretrained VGG-16 with the experience of the natural visual objects and even in the untrained VGG-Face without any visual experience. Then, for each of the two DCNNs, images from stimulus set 3 were applied to test the reliability and generality of the expression recognition ability of expression-selective units. The classification accuracies in these two DCNNs were also higher than the chance level (pretrained VGG-16: accuracy = 23.33%; 95% CI, 22.13 to 24.54%; untrained VGG-Face: accuracy = 21.60%; 95% CI, 20.44 to 22.79%, bootstrap with 10,000 replications) (Fig. 3A). Crucially, we found that the classification accuracy of the expression-selective units in the pretrained VGG-Face was significantly much higher than those in the pretrained VGG-16 (P < 0.001, Mann-Whitney U test) and untrained VGG-Face (P < 0.001), and the classification accuracy of the expression-selective units in the pretrained VGG-16 was better than those in the untrained VGG-Face (P < 0.001). These results were relatively stable when changing the number of PCs in the SVC model (fig. S2B). The results revealed that expression-selective units in the DCNNs, whether with face identity recognition experience or not, could classify facial expressions. The face identity recognition experience was more beneficial than general object classification ex￾perience for the enhancement of the units’ expression recognition ability. Furthermore, for both DCNNs, the similarities of expression confusion effect between the expression-selective units and humans were tested by correlating their error patterns with that of human participants. The error patterns of expression-selective units in neither of the two DCNNs resembled that of human (Fig. 3D, VGG-16: Kendall’s  = −2.3 × 10−3, P = 0.986; Fig. 3E, untrained VGG-Face: Kendall’s  = 0.02, P = 0.872). Collectively, only the expression-selective units in the pretrained VGG-Face presented a human-like expression confusion effect. These results implied that, at least for facial expression recognition, the domain-specific training experience was necessary for a DCNN to develop the human-like perception. As the expression-selective units in the pretrained VGG-16 and untrained VGG-Face showed no similarity to human expression recognition, we hypothesized that they may not perceive expressions like humans. To verify this hypothesis, we further investigated whether the expression-selective units in these two DCNNs would show a categorical perception effect by performing the same ABX discrim￾ination task as that used in the pretrained VGG-Face. As shown in Fig. 3G, the expression-selective units from both the pretrained VGG-16 and untrained VGG-Face only presented a weak S-shaped trend in very few continua. By comparing their goodness of fit with that of the pretrained VGG-Face, the identification curves of the expression-selective units in the pretrained VGG-16 and untrained VGG-Face showed a more obvious linear trend than that in the pre￾trained VGG-Face (linear: pretrained VGG-Face versus pretrained VGG-16: P = 0.003; pretrained VGG-Face versus untrained VGG-Face: P = 0.005; pretrained VGG-16 versus untrained VGG-Face: P = 0.609; Mann-Whitney U test) (Fig. 3F). Correspondingly, they presented a significantly weaker logistic trend than the pretrained VGG-Face (logistic: pretrained VGG-Face versus pretrained VGG-16: P = 0.002; pretrained VGG-Face versus untrained VGG-Face: P = 0.002; and pretrained VGG-16 versus untrained VGG-Face: P = 0.307) (Fig. 3F). Together, the face identity recognition experience, which was domain specific, helped expression-selective units in the DCNN to achieve a human-like categorical perception of facial expres￾sions, whereas the general object classification experience and the architecture itself may only help capture physical features of facial expressions. In addition, we generated three new stimuli sets, including the scrambled (fig. S3B), contrast-negated (fig. S3C), and inverted (fig. S3D) versions of the face images in stimulus set 3 (fig. S3A) and conducted further control analyses to explore the possible contribution of low-level features (e.g., texture, brightness, edge, and gradient) to the expression recognition of the expression-selective units. For ex￾ample, the inverted face retains all low-level features of the upright faces. We tested whether the expression-selective units of the three DCNNs could reliably classify the expressions of the three new stimulus sets. As shown in table S1 and fig. S3 (E to G), for all the three stimulus sets, the classification accuracies of the expression￾selective units in all three DCNNs decreased significantly, near the chance level (all accuracies: <20.20%; chance level: 16.67%). There￾fore, it is unlikely that the low-level features in the face images were simply the determining factors for the emergence of the expression￾selective units. DISCUSSION The purpose of the current study was to evaluate whether the spon￾taneously emerged human-like expression-selective units in DCNNs would depend on domain-specific visual experience. We found that the pretrained VGG-Face, a DCNN with visual experience of face identity, could spontaneously generate expression-selective units. In addition, these units allowed reliable human-like expression per￾ception, including expression confusion effect and categorical per￾ception effect. By further comparing the pretrained VGG-Face with VGG-16 and untrained VGG-Face, we found that, although all the three DCNNs could generate expression-selective units, their per￾formance of expression classification differed. The classification accuracy of the expression-selective units in the pretrained VGG-Face was the highest, whereas that in the untrained VGG-Face was the lowest. More critically, only the expression-selective units in the pretrained VGG-Face showed apparent human-like expression confusion effect and categorical perception effect. Expression-selective units in both the VGG-16 and untrained VGG-Face did not perform Downloaded from https://www.science.org at Southern Medical University on April 22, 2023
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有