正在加载图片...
SCIENCE ADVANCES RESEARCH ARTICLE mintion task ofon to indicated the extent to which exp ssion-seective units could cor image nto one expres ons ort mages were us Analysis There were 38,400 trials in total (4800 different images the expres each repeated eight times)As each participant only completed th on rates o Morphed expression discrimination task epre To test v man-lik the confusion matrix indicated the ratio of how many times the edepsondicmRaionaskihwasonmpsrabietohe pression i was recognized as the expression j across participants e1ngs(E36,3940 Analysis of network units iness or ange timulus set,and theresp onses of were the,a binary SVC model wa model was used t nducted te entify th 9 morph the two prototypi anger was defined to be the network's identification rate at the cur sion(Ps0.01)but no signif ant effect of identity (P>0.01).For rent expression morph level. identi sion was calculated by taking the difference between the average logistic function to the curve,respectively.If the network perceived response to all images within the same expression and average likea human,the identificat ion cur e sh difference by the sp of the ulus set (32) fication curves should be the best ∑pekA背-∑PA Comparisons between different DCNNs ∑P(A9-∑R14Ag) To test the dependence of human-like expression perception of ex 0n- ols.The where TV is the tuning value of unit i to expression k,A is the VGG-16 is trained for natural object classification,and the untraine the num anon per ed to n the VGG-Face recognized expressions better than the VGG-6and un trained VGG-Face Then.we assessd th To test the reliability of the expression r ognition ability of ex odness of fit (R)of the logistic function and the on. ast,the Mann-Whitney was used o prec st t the he to statistically evalua e the d erences among ience ara/doi/10 1126 sions of images from stimulus set 3.In the SVC model.the firs from Bio-protoc REFERENCES AND NOTES overfitting the training data,and (to unify the DCNNcom V.Bruce,A.Young,Unders nent number,th of PCs should also be no large (units in the untrained VGG-Face).Meanwhile.the first Zhouetal.Sci.Ady.8.eabi4383 (2022)23 March 2022 9of11 Zhou et al., Sci. Adv. 8, eabj4383 (2022) 23 March 2022 SCIENCE ADVANCES | RESEARCH ARTICLE 9 of 11 expressions). All images were randomly divided into eight groups of 600 images, of which each expression included 100 images. Each participant completed the expression discrimination task of one to eight groups of images. Participants were instructed to classify each image into one of the six facial expressions. Last, each image was classified by eight participants. Analysis. There were 38,400 trials in total (4800 different images, each repeated eight times). As each participant only completed the expression classification of several groups of images, we pooled the data of all participants to calculate the confusion matrix. The con￾fusion matrix was a 6-by-6 matrix with the rows representing the true expressions (ground truth) and the columns representing the expressions discriminated by participants. The element (i, j) of the confusion matrix indicated the ratio of how many times the ex￾pression i was recognized as the expression j across participants. Analysis of network units Each DCNN was presented with stimulus set 1, and the responses of the units in the final layer of the feature extraction network (conv5-3) were extracted to be analyzed. Similar to Nasr et al. (27), a two￾way nonrepeated ANOVA with expression (six facial expressions) and identity (104 identities) as factors was conducted to identify the expression-selective units. The “expression-selective units” were re￾ferred to as those that exhibited a significant main effect of expres￾sion (P ≤ 0.01) but no significant effect of identity (P > 0.01). For each expression-selective unit, the responses were normalized across all images in stimulus set 1. After that, its tuning value for each expres￾sion was calculated by taking the difference between the average response to all images within the same expression and the average response to all images in the stimulus set and then dividing the difference by the SD of the responses across all images in the stim￾ulus set (32) TVi k = _1 Pk ∑p∈k Ai p − _1 P ∑p=1 P Ai p ──────────────── √ _____________________ _ P 1 ∑p=1 P (Ai p − _1 P ∑p=1 P Ai p ) 2 where TVi k is the tuning value of unit i to expression k, Ai p is the normalized response of unit i to image p, Pk is the number of images that are labeled as expression k, and P is the number of all images in the database. The tuning value reflects the extent to which a unit activates preferentially to images of a specific expression. For each unit, the expression with the highest tuning value is defined as its preferred expression. To test the reliability of the expression recognition ability of ex￾pression-selective units in the pretrained VGG-Face, the SVC model trained on stimulus set 1 was used to predict the expressions of images from stimulus set 2. To further test the generality of the expression recognition ability of expression-selective units and the necessity of the domain-specific experience, for each DCNN, the SVC model trained on stimulus set 1 was used to predict the expres￾sions of images from stimulus set 3. In the SVC model, the first 600 PCs of the responses of expression-selective units were used. The reasons for choosing the first 600 PCs were as follows: (i) the number of PCs should be less than the number of the images to avoid overfitting the training data, and (ii) to unify the DCNNs’ compo￾nent number, the number of PCs should also be no larger than the least number of the expression-selective units among the DCNNs (i.e., 644 units in the untrained VGG-Face). Meanwhile, the first 600 PCs could explain nearly 100% variance of the expression selec￾tive features (fig. S1). The prediction accuracy of the SVC model indicated the extent to which expression-selective units could cor￾rectly classify facial expressions. Further, predicted expressions and true expressions of the images were used to construct the confusion matrix. We then quantify the similarity of the error patterns between the expression-selective units and humans by calculating the pair￾wise Kendall rank correlation of the error rates (i.e., vectorized off-diagonal misclassification rates of the confusion matrices). Morphed expression discrimination task To test whether expression-selective units exhibit a human-like categorical perception of morphed facial expressions, we designed a morphed expression discrimination task that was comparable to the ABX discrimination task designed for human beings (36, 39, 40). Taking the happiness-anger continuum for example, the expression￾selective units whose preferred expression was happiness or anger were selected to perform the task. At first, a binary SVC model was trained on the prototypic expressions (happy and angry expressions of all 104 identities in stimulus set 1) and then the trained SVC model was used to predict the expressions of morphed expression images (the middle 199 morph levels besides the two prototypic expressions). For each morph level, the identification frequency of anger was defined to be the network’s identification rate at the cur￾rent expression morph level. To quantitatively characterize the shape of the identification curve, we fitted the linear function, quadratic function (poly2), and logistic function to the curve, respectively. If the network perceived the morphed expressions like a human, the identification curve should be nonlinear and should show an abrupt category boundary. Thus, the goodness of fit (R2 ) of the logistic function (S-shaped) to identi￾fication curves should be the best. Comparisons between different DCNNs To test the dependence of human-like expression perception of ex￾pression-selective units on face identity recognition experience, we also introduced VGG-16 and untrained VGG-Face as controls. The VGG-16 is trained for natural object classification, and the untrained VGG-Face has no training experience. First, the expression classifi￾cation performances of expression-selective units in different DCNNs were compared to explore whether these units in the pretrained VGG-Face recognized expressions better than the VGG-16 and un￾trained VGG-Face. Then, we assessed the differences of categorical perception of morphed expressions among the DCNNs by respectively comparing the goodness of fit (R2 ) of the logistic function and the goodness of fit (R2 ) of the linear function. Last, the Mann-Whitney U test was used to statistically evaluate the differences among the DCNNs. SUPPLEMENTARY MATERIALS Supplementary material for this article is available at https://science.org/doi/10.1126/ sciadv.abj4383 View/request a protocol for this paper from Bio-protocol. REFERENCES AND NOTES 1. V. Bruce, A. Young, Understanding face recognition. Br. J. Psychol. 77, 305–327 (1986). 2. V. Bruce, Influences of familiarity on the processing of faces. Perception 15, 387–397 (1986). 3. A. J. Calder, J. Keane, A. W. Young, M. Dean, Configural information in facial expression perception. J. Exp. Psychol. Hum. Percept. Perform. 26, 527–551 (2000). Downloaded from https://www.science.org at Southern Medical University on April 22, 2023
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有