正在加载图片...
G.Chen et al. Computer Networks 190 (2021)107952 forest is based on the global distribution of data instances while LOF Algorithm 1 Recommendation Filtering Algorithm is based on the local density of data instances.When the proportion of malicious recommenders increases,a bad recommendation will also be Input:R=(r1.r2.....m).Iterationsmax in a dense area with other bad recommendations surrounding it.Hence, Output:R'=(. it is difficult to judge whether the bad recommendation is an outlier or l:for each re∈Rdo not based on its height in the isolation tree.Similarly,the LOF score 2: Construct vector (DTir:DTrj) of a bad recommendation will be close to 1 because its local density is 3:end for almost the same as its neighbors'.Obviously,neither isolation forest nor 4:Initialize: LOF can effectively detect bad recommendations when the proportion 5: randomly select two vectors (DTir:DT).(DTir:DT) of malicious recommenders increases. 6: Iterations 1.R'= DBSCAN and k-means are both clustering-based outlier detection 7:repeat methods.The former autonomously determines the number of clusters 8 C1=0,C2=0 based on the density of data instances.The latter determines the 9 Flag false number of clusters according to the user-specified parameter k and 10: for each (DTir:DT)do divides data instances into clusters based on the distance between 11: dist=(DTirs -DTins+(DTrs -DT)2 them and the centroids.The same is that both of them need to de- termine which clusters of data instances are outliers according to the 12: dist =V(DTins DTin,+(DTrU -DTj)2 user-specified rules.In the recommendation trust evaluation,we can 13: if dist <dist then take the direct trust of the trustor about the recommenders and the 14: C1=C1 U(DTing.DTj) recommendations provided by the recommenders as data instances. 15: else The reason behind that is the average value of the direct trust of the 16: C2=C2 U(DTirg:DTrj) trustor about good recommenders is larger.The recommendations in 17: end if the cluster which centroid's first value is the largest can be regarded as 18: end for good recommendations and others will be deemed to be bad.Then,the 19 (DT:DT)=(a∑necDTin∑ecDT》 trustor can filter out bad recommendations from all recommendations 20: it received based on the clustering results.Whether it is DBSCAN (DT,:DT=(高∑m4 ec DTir高∑ec:DT,J or k-means,good recommendations and bad recommendations will 21: if (DT,:DTT,:DT)then be divided into different clusters even if the proportion of malicious 22: (DTirs:DTs)=(DTir:DT) recommenders increases.Therefore,both of them are effective for 23: Flag true detecting bad recommendations according to the rules we set.The 24: end if space complexity of DBSCAN is O(m)where m is the number of data 25 if (DT:DT (DTr,:DT)then instances.The space complexity of k-means is o(m +k)where k is 26: (DT DT)=(T:DT, the number of clusters specified by users.In the recommendation trust 27: Flag true evaluation,we set k to 2.As a result,the space complexity of k- 28 end if means is approximately equal to the space complexity of DBSCAN.The 29:until Iterations Iterationsmax or Flag =false time complexity of DBSCAN is O(mlogm).The time complexity of k- 30:if DTirs >=DTir,then means is O(Ikm)where I is the number of iterations specified by users. 31: Cfilter=C1 Because I and k are much smaller than m,the time complexity of k- 32:else means can be regarded as O(m).We can conclude that k-means is more 33:Cfilter=C2 efficient than DBSCAN in terms of time complexity.Based on the above 34:end if comparative analysis,we can conclude from both effectiveness and 35:for each (DTin DT E Cfilter do efficiency that the choice of k-means to detect bad recommendations is 36: R'=R'Urk better than other outlier detection methods.Consequently,we propose 37:end for a recommendation filtering algorithm based on k-means. 38:return R' 5.2.2.Recommendation filtering algorithm based on k-means We take the direct trust of the trustor about the recommenders and the recommendations provided by the recommenders as vectors which vectors are added to the corresponding clusters until it does not change will be taken by the recommendation filtering algorithm as inputs. any more. Through the filtering algorithm,the vectors will be divided into two After the clustering algorithm,the vectors are divided into two clusters.The cluster whose centroid is larger will be selected and the clusters.One of the clusters includes vectors corresponding to the good corresponding recommenders will be regarded as recommenders after recommenders and the centroid's first value of it is larger because the filtering. average direct trust of the good recommenders is larger.We consider Algorithm 1 illustrates the recommendation filtering algorithm in the first cluster as the trustworthy one.Finally,the recommenders cor- detail.The inputs are a list of recommenders R fri,r2.....m responding to each vector in the trustworthy cluster are subsequently and the max number of interactions Iterationsax.The outputs are a used for the recommendation trust evaluation.Through the recommen- list of recommenders R'=(....that remain after filtering. dation filtering algorithm,the impact of trust related attacks such as For each recommender r&,the algorithm constructs a vector of two bad mouthing attacks and ballot stuffing attacks can be minimized by values:the direct trust of the trustor about the recommender and filtering out the malicious recommenders.But the recommendations the recommendation of the recommender about the trustee.At the provided by the remaining recommenders may not be all used.We beginning of filtering,the algorithm randomly selects two vectors as need to consider some important factors that affect the accuracy of the initial centroids of clusters.The core part of the filtering algorithm is recommendation trust. based on the k-means clustering algorithm (Lines 7-29).It separately calculates the Euclidean distance of each vector and the centroid of 5.2.3.Evaluation of recommendation trust two clusters.Then,each vector will be added to the cluster closer to it Although the recommendation filtering algorithm can effectively (Lines 10-18).The centroid of each cluster will be recalculated after all resist some trust related attacks,it may not filter out all the malicious 6Computer Networks 190 (2021) 107952 6 G. Chen et al. forest is based on the global distribution of data instances while LOF is based on the local density of data instances. When the proportion of malicious recommenders increases, a bad recommendation will also be in a dense area with other bad recommendations surrounding it. Hence, it is difficult to judge whether the bad recommendation is an outlier or not based on its height in the isolation tree. Similarly, the LOF score of a bad recommendation will be close to 1 because its local density is almost the same as its neighbors’. Obviously, neither isolation forest nor LOF can effectively detect bad recommendations when the proportion of malicious recommenders increases. DBSCAN and 𝑘-means are both clustering-based outlier detection methods. The former autonomously determines the number of clusters based on the density of data instances. The latter determines the number of clusters according to the user-specified parameter 𝑘 and divides data instances into clusters based on the distance between them and the centroids. The same is that both of them need to de￾termine which clusters of data instances are outliers according to the user-specified rules. In the recommendation trust evaluation, we can take the direct trust of the trustor about the recommenders and the recommendations provided by the recommenders as data instances. The reason behind that is the average value of the direct trust of the trustor about good recommenders is larger. The recommendations in the cluster which centroid’s first value is the largest can be regarded as good recommendations and others will be deemed to be bad. Then, the trustor can filter out bad recommendations from all recommendations it received based on the clustering results. Whether it is DBSCAN or 𝑘-means, good recommendations and bad recommendations will be divided into different clusters even if the proportion of malicious recommenders increases. Therefore, both of them are effective for detecting bad recommendations according to the rules we set. The space complexity of DBSCAN is 𝑂(𝑚) where 𝑚 is the number of data instances. The space complexity of 𝑘-means is 𝑂(𝑚 + 𝑘) where 𝑘 is the number of clusters specified by users. In the recommendation trust evaluation, we set 𝑘 to 2. As a result, the space complexity of 𝑘- means is approximately equal to the space complexity of DBSCAN. The time complexity of DBSCAN is 𝑂(𝑚𝑙𝑜𝑔𝑚). The time complexity of 𝑘- means is 𝑂(𝐼𝑘𝑚) where 𝐼 is the number of iterations specified by users. Because 𝐼 and 𝑘 are much smaller than 𝑚, the time complexity of 𝑘- means can be regarded as 𝑂(𝑚). We can conclude that 𝑘-means is more efficient than DBSCAN in terms of time complexity. Based on the above comparative analysis, we can conclude from both effectiveness and efficiency that the choice of 𝑘-means to detect bad recommendations is better than other outlier detection methods. Consequently, we propose a recommendation filtering algorithm based on 𝑘-means. 5.2.2. Recommendation filtering algorithm based on 𝑘-means We take the direct trust of the trustor about the recommenders and the recommendations provided by the recommenders as vectors which will be taken by the recommendation filtering algorithm as inputs. Through the filtering algorithm, the vectors will be divided into two clusters. The cluster whose centroid is larger will be selected and the corresponding recommenders will be regarded as recommenders after filtering. Algorithm 1 illustrates the recommendation filtering algorithm in detail. The inputs are a list of recommenders 𝑅 = {𝑟1 , 𝑟2 ,… , 𝑟𝑚} and the max number of interactions 𝐼 𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠𝑚𝑎𝑥. The outputs are a list of recommenders 𝑅′ = {𝑟 ′ 1 , 𝑟′ 2 , …, 𝑟′ 𝑛 } that remain after filtering. For each recommender 𝑟𝑘 , the algorithm constructs a vector of two values: the direct trust of the trustor about the recommender and the recommendation of the recommender about the trustee. At the beginning of filtering, the algorithm randomly selects two vectors as initial centroids of clusters. The core part of the filtering algorithm is based on the 𝑘-means clustering algorithm (Lines 7–29). It separately calculates the Euclidean distance of each vector and the centroid of two clusters. Then, each vector will be added to the cluster closer to it (Lines 10–18). The centroid of each cluster will be recalculated after all Algorithm 1 Recommendation Filtering Algorithm Input: 𝑅 = {𝑟1 , 𝑟2 , … , 𝑟𝑚}, 𝐼 𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠𝑚𝑎𝑥 Output: 𝑅′ = {𝑟 ′ 1 , 𝑟′ 2 , …, 𝑟′ 𝑛 } 1: for each 𝑟𝑘 ∈ 𝑅 do 2: 𝐶𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡 𝑣𝑒𝑐𝑡𝑜𝑟 (𝐷𝑇𝑖𝑟𝑘 ; 𝐷𝑇𝑟𝑘 𝑗 ) 3: end for 4: Initialize: 5: 𝑟𝑎𝑛𝑑𝑜𝑚𝑙𝑦 𝑠𝑒𝑙𝑒𝑐𝑡 𝑡𝑤𝑜 𝑣𝑒𝑐𝑡𝑜𝑟𝑠 (𝐷𝑇𝑖𝑟𝑥 ; 𝐷𝑇𝑟𝑥 𝑗 ), (𝐷𝑇𝑖𝑟𝑦 ; 𝐷𝑇𝑟𝑦 𝑗 ) 6: 𝐼 𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 = 1, 𝑅′ = Ø 7: repeat 8: 𝐶1 = Ø, 𝐶2 = Ø 9: 𝐹 𝑙𝑎𝑔 = 𝑓 𝑎𝑙𝑠𝑒 10: for each (𝐷𝑇𝑖𝑟𝑘 ; 𝐷𝑇𝑟𝑘 𝑗 ) do 11: 𝑑𝑖𝑠𝑡1 = √ (𝐷𝑇𝑖𝑟𝑘 − 𝐷𝑇𝑖𝑟𝑥 ) 2 + (𝐷𝑇𝑟𝑘 𝑗 − 𝐷𝑇𝑟𝑥 𝑗 ) 2 12: 𝑑𝑖𝑠𝑡2 = √ (𝐷𝑇𝑖𝑟𝑘 − 𝐷𝑇𝑖𝑟𝑦 ) 2 + (𝐷𝑇𝑟𝑘 𝑗 − 𝐷𝑇𝑟𝑦 𝑗 ) 2 13: if 𝑑𝑖𝑠𝑡1 <= 𝑑𝑖𝑠𝑡2 then 14: 𝐶1 = 𝐶1 ∪ (𝐷𝑇𝑖𝑟𝑘 , 𝐷𝑇𝑟𝑘 𝑗 ) 15: else 16: 𝐶2 = 𝐶2 ∪ (𝐷𝑇𝑖𝑟𝑘 , 𝐷𝑇𝑟𝑘 𝑗 ) 17: end if 18: end for 19: (𝐷𝑇 ′ 𝑖𝑟𝑥 ; 𝐷𝑇 ′ 𝑟𝑥 𝑗 ) = ( 1 |𝐶1 | ∑ 𝐷𝑇𝑖𝑟𝑘 ∈𝐶1 𝐷𝑇𝑖𝑟𝑘 ; 1 |𝐶1 | ∑ 𝐷𝑇𝑟𝑘 𝑗∈𝐶1 𝐷𝑇𝑟𝑘 𝑗 ) 20: (𝐷𝑇 ′ 𝑖𝑟𝑦 ; 𝐷𝑇 ′ 𝑟𝑦 𝑗 ) = ( 1 |𝐶2 | ∑ 𝐷𝑇𝑖𝑟𝑘 ∈𝐶2 𝐷𝑇𝑖𝑟𝑘 ; 1 |𝐶2 | ∑ 𝐷𝑇𝑟𝑘 𝑗∈𝐶2 𝐷𝑇𝑟𝑘 𝑗 ) 21: if (𝐷𝑇 ′ 𝑖𝑟𝑥 ; 𝐷𝑇 ′ 𝑟𝑥 𝑗 ) ≠ (𝐷𝑇𝑖𝑟𝑥 ; 𝐷𝑇𝑟𝑥 𝑗 ) then 22: (𝐷𝑇𝑖𝑟𝑥 ; 𝐷𝑇𝑟𝑥 𝑗 ) = (𝐷𝑇 ′ 𝑖𝑟𝑥 ; 𝐷𝑇 ′ 𝑟𝑥 𝑗 ) 23: 𝐹 𝑙𝑎𝑔 = 𝑡𝑟𝑢𝑒 24: end if 25: if (𝐷𝑇 ′ 𝑖𝑟𝑦 ; 𝐷𝑇 ′ 𝑟𝑦 𝑗 ) ≠ (𝐷𝑇𝑖𝑟𝑦 ; 𝐷𝑇𝑟𝑦 𝑗 ) then 26: (𝐷𝑇𝑖𝑟𝑦 ; 𝐷𝑇𝑟𝑦 𝑗 ) = (𝐷𝑇 ′ 𝑖𝑟𝑦 ; 𝐷𝑇 ′ 𝑟𝑦 𝑗 ) 27: 𝐹 𝑙𝑎𝑔 = 𝑡𝑟𝑢𝑒 28: end if 29: until 𝐼 𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠 > 𝐼 𝑡𝑒𝑟𝑎𝑡𝑖𝑜𝑛𝑠𝑚𝑎𝑥 𝑜𝑟 𝐹 𝑙𝑎𝑔 == 𝑓 𝑎𝑙𝑠𝑒 30: if 𝐷𝑇𝑖𝑟𝑥 >= 𝐷𝑇𝑖𝑟𝑦 then 31: 𝐶𝑓 𝑖𝑙𝑡𝑒𝑟 = 𝐶1 32: else 33: 𝐶𝑓 𝑖𝑙𝑡𝑒𝑟 = 𝐶2 34: end if 35: for each (𝐷𝑇𝑖𝑟𝑘 ; 𝐷𝑇𝑟𝑘 𝑗 ) ∈ 𝐶𝑓 𝑖𝑙𝑡𝑒𝑟 do 36: 𝑅′ = 𝑅′ ∪ 𝑟𝑘 37: end for 38: return 𝑅′ vectors are added to the corresponding clusters until it does not change any more. After the clustering algorithm, the vectors are divided into two clusters. One of the clusters includes vectors corresponding to the good recommenders and the centroid’s first value of it is larger because the average direct trust of the good recommenders is larger. We consider the first cluster as the trustworthy one. Finally, the recommenders cor￾responding to each vector in the trustworthy cluster are subsequently used for the recommendation trust evaluation. Through the recommen￾dation filtering algorithm, the impact of trust related attacks such as bad mouthing attacks and ballot stuffing attacks can be minimized by filtering out the malicious recommenders. But the recommendations provided by the remaining recommenders may not be all used. We need to consider some important factors that affect the accuracy of the recommendation trust. 5.2.3. Evaluation of recommendation trust Although the recommendation filtering algorithm can effectively resist some trust related attacks, it may not filter out all the malicious
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有