正在加载图片...
第2期 齐小刚,等:基于MapReduce的并行异常检测算法 ·227· N:data block number 输出<key,value>= 0:threshold for LOF <di,[(o dis(di,)k-dis(di)]> 输出Abnormal data and LOF values SecondMapper Get data set which lof>0 from算法2 add in 输入<key,value>= XX <di,[(og,dis(di,o)),k-dis(d]> Calculate lof>0 ofxEXX 输出<key,value>= return Abnormal data and LOF <di,(og,reach-dis(di,o))> 由于LOF算法不足,本文重新定义了k邻近 for osE k-distinct-neighbour do 距离的概念,并结合MapReduce框架提出并行异 if k-dis(og)<dis(di,o) 常检测算法。重新定义的k邻近距离概念如下。 reach-dis(d;,o)=dis(di,o) k-邻近距离(k-distinct-distance):对于任意正整 else reach-dis(d.o)=k-dis(di,o) 数k,点p的k-邻近距离定义为k-distance(p)= end d(p,o),如果满足以下条件: SecondReducer 1)在样本空间中,至少存在k个点q,使得 输入<key,value>= dp,q)≤dp,o)o <di,(o,reach-dis(di,))> 2)在样本空间中,至多存在k-1个点q,使 输出<key,value>= 得0<dp,q)<dp,o)o <di,Ird(d ) 其中,d(p,o)为点p与点o之间的距离。 for value do 算法2 Compute lof>g Ird(d)=k/reach-disk(di,o), 输入data setX=x1,2,…,xw oEk-distinct-neighbour k:number of nearest neighbor end 6:threshold for LOF ThirdMapper N:data block number 输入<key,value>=<d,rd(d)> 输出data set which lof>e 输出<key,value>= Initialize a Hadoop Job <di(lof(d)>),lof(d)> Set TaskMapReduce class for oge k-distinct-neighbour do Logically divide X into multiple data blocks: lof(d)=(Ird(ox)/)/Ird(di). D1,D2,·,DN o E k-distinct-neighbour In the j-th TaskMapReduce end FirstMapper if lof(d,)> 输入D=d1,d,…,dm output 输出<key,value>= ThirdReduce <di,[(o,dis(di,o),k-dis(d)]> 输入<key,value>=<d,lof(d)>),lof(d)> for each data di.i=1,2.....m do 输出<key,value>=<d*,lof(d*)> Calculate dis distance(di,dj).j=1..,m for value do Sort dis of d Sort lof(d)for d,and record d,* for each dis of d,do end if dis0&k-neighbourl<k 本部分主要介绍了MR-DLOF算法基本思想 add d and dis in k-distinct-neighbor 和步骤。首先,将数据集存放在HDFS上并将原 record (og,dis(d,)) 始数据集逻辑地切分为多个数据块;然后,根据 end MapReduce原理并行处理各个数据块中的数据, Calculate k-distinct-distance record k-dis(d) 使得每个数据点的k-邻近距离和LOF值的计算 end 仅在单个块中执行:最后将每个数据块中局部异 FirstReducer 常因子小于设定阈值的点剔除,并将大于设定阈 输入<key,value>= 值的数据点合并成一个新的数据集,更新其k邻 <di.[(o,dis(di,o),k-dis(d]> 近距离和LOF值,从而提高算法的准确度和灵敏N: data block number θ: threshold for LOF 输出 Abnormal data and LOF values lo fi > θ XX Get data set which from 算法 2 add in lo f Calculate of i > θ xj ∈ XX return Abnormal data and LOF ၿ̅ LOF ኪกʿᡛὋఴ஠᧗ள߿ ˿˦k-᥵ᤂ ᡯሎᄉഏএὋࣲፆՋ MapReduce ಳ౵ଡѢࣲᛠऩ ࣡฽೜ኪกnj᧗ள߿˦ᄉ k-᥵ᤂᡯሎഏএݟʽnj k p k−distance(p) = d(p,o) k-᥵ᤂᡯሎ (k-distinct-distance): ͉ࠪ̅਒ൣஞ ஜ Ὃཁ ᄉ k-᥵ᤂᡯሎ߿˝˦ Ὃݟ౦໗ᡛ̾ʽ్͇Ὑ k q d(p,q) d(p,o) 1) ڙಧఴቆᫍ˖Ὃᒯڙߚ࠵˓ ཁ Ὃ΍३ nj k−1 q 0 < d(p,q) < d(p,o) 2) ڙಧఴቆᫍ˖Ὃᒯܲڙߚ˓ ཁ Ὃ΍ ३ nj Ф˖Ὃd(p,o) ˝ཁ p ˀཁ o ˧ᫍᄉᡯሎnj lo f 算法 2 Compute i > θ 输入 data set X = x1, x2,··· , xN k: number of nearest neighbor θ: threshold for LOF N: data block number 输出 data set which lofi > θ Initialize a Hadoop Job NJNJSet TaskMapReduce class D1,D2,··· ,DN Logically divide X into multiple data blocks: . In the j-th TaskMapReduce FirstMapper NJNJ输入 D = d1,d2,··· ,dm < key,value >= < di,[(ok,dis(di,ok)), k−dis(di)] > NJNJ输出 NJNJNJNJ di for each data do ,i = 1,2,··· ,m disij = distance(di Calculate ,dj), j = 1,··· ,m NJNJSort of disij di NJNJfor each of do disij di NJNJNJif disij 0&|k−neighbour| < k di disij (ok,dis(di,ok)) NJNJNJNJadd and in k-distinct-neighbor record NJNJend NJNJCalculate k-distinct-distance record k−dis(di) end FirstReducer < key,value >= < di,[(ok,dis(di,ok)), k−dis(di)] > NJNJ输入 NJNJNJNJ < key,value >= < di,[(ok,dis(di,ok)), k−dis(di)] > NJNJ输出 NJNJNJNJ SecondMapper < key,value >= < di,[(ok,dis(di,ok)), k−dis(di)] > NJNJ输入 NJNJNJNJ < key,value >= < di,(ok,reach−dis(di,ok)) > NJNJ输出 NJNJNJNJ for do ok ∈ k−distinct−neighbour k−dis(ok) < dis(di NJNJif ,ok) reach−dis(di,ok) = dis(di NJNJNJNJ ,ok) reach−dis(di,ok) = k−dis(di NJNJelse ,ok) end SecondReducer < key,value >= < di,(ok,reach−dis(di,ok)) > NJNJ输入 NJNJNJNJ < key,value > = < di,lrd(di) > NJNJ输出 NJNJNJNJ for value do lrd(di) = k/ reach−disk(di,ok) ok ∈ k−distinct− NJNJ , NJNJNJNJNJ neighbour end ThirdMapper < key,value >=< di NJNJ输入 ,lrd(di) > < key,value >= < di(lof(di) > θ),lof(di) > NJNJ输出 NJNJNJNJ for do ok ∈ k−distinct−neighbour lof(di) = ( lrd(ok)/k)/lrd(di) ok ∈ k−distinct− NJNJ , NJNJNJNJNJ neighbour end if lof(di) > θ NJNJoutput ThirdReduce 输入 < key,value >=< di(lof(di) > θ),lof(di) > NJNJ输出 < key,value >=< di∗,lof(di∗) > for value do NJNJSort for and record lof(di) di di∗ end ఴᦉѫ˞᜵̭ፀ˿ MR-DLOF ኪก۲ఴধਆ ֖൥ᰠnjᯪЎὋ࠱ஜ૵ᬶߚஉڙ HDFS ʼ࠱ࣲԓ ݼஜ૵ᬶ᤾ᣣڠѬѫ˝ܲ˓ஜ૵ڰ὚ཨՐὋ಩૵ MapReduce ԓူࣲᛠܪူՉ˓ஜ૵ڰ˖ᄉஜ૵Ὃ ΍३ඇ˓ஜ૵ཁᄉ k-᥵ᤂᡯሎ֖ LOF Ϙᄉ᝟ኪ ̨ڙӬ˓ڰ˖੯ᛠ὚ణՐ࠱ඇ˓ஜ૵ڰ˖ࡌᦉऩ ᫟߿᝹ܷ̅࠱ࣲϘᄉཁҔᬓὋ᫟߿᝹̅࠴ߔځ࣡ Ϙᄉஜ૵ཁՋࣲ੆ʶ˓ளᄉஜ૵ᬶὋఝளФ k-᥵ ᤂᡯሎ֖ LOF ϘὋ̯Ꮺଡᰳኪกᄉэᆷऎ֖༦ஏ ኃ 2 య ᴎ࠴ѷὋኍὙ۲̅ MapReduce ᄉࣲᛠऩ࣡฽೜ኪก g227g
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有