正在加载图片...
minimizing the following objective function and V7 will also be connected.Learning RRMF based on these added links is now in line with the underlying seman- ∑∑A,lIU.-U.2 tics.From our experiments on the WebKB data set,we find that RRMF can still achieve performance comparable with 1 LCMF just by adopting the simple link preprocessing strat- egy as described above. 2.3 Learning In this subsection,we first prove that the objective function in (3)is convex with respect to (w.r.t.)any one of its param- eters,U and V.Then,we propose an alternating projection algorithm to learn the parameters in linear time and show that ∑UaU,d=tr(U'cU), (2) it is guaranteed to converge to a local optimum. d= Convexity of the Objective Function where Aij=1 if there is a relation between entities i and Lemma 1 The Laplacian matrix C is psd.i.e..C0. j,and otherwise Aij=0,and C =D-A is known as Proof:For any,n×1 vector x≠0,xTcx= the Laplacian matrix [Chung,1997]with D being a diagonal ∑=1∑=1A-xP≥0.■ matrix whose diagonal elements Dii=;Aij,and tr() denotes the trace of a matrix. Lemma2∑VV,*之0. Combining(1)and(2),we obtain the following objective function which we seek to minimize during learning: Proof:Forany D×1 vectorz≠0,z(∑1VVr)z --UVP+(U+V)+(UCU) ∑1 TVT.V.=∑1(W#z2≥0.■ Theorem 1 f is convex w.r.t.U. =lx-UVT2+r[U(oI+Bc)U]+gu(VVT),(3) Proof:We first rewrite f as follows:f which seamlessly integrates the content information and the ∑∑(X-U.vg2+∑,Ur(al+ relation information into a principled framework.Because BC)U.+C1,where C1 is a constant independent of U we use relation information to regularize the content MF pro- Letg=2∑1∑1(X-UVg)2=C2+ cedure,we call the model in(3)relation regularized matrix factorization,abbreviated as RRMF. ∑[U.(∑1VV)U-2(∑1XV)Ug: Remark 1 The normalized Laplacian matrix [Chung,1997]. where C2 is a constant independent of U.It is easy to see that GV.V,.. 0g given by C D-1/2CD-1/2 I-D-1/2AD-1/2 can be used to substitute C in (2)and (3)for regulariza- and 02g tion.If C is adopted.the objective function in (2)will be auo(if).If we use u (UU2.....,Un.)T to denote the vector repre- The sentation of U,the second-order derivative (Hes- sian)of g w.r.t.u will be a block-diagonal matrix: choice between C and C depends on applications.In this Gu≌ diag[G1 G2.....Gn].According to paper.the lemmas and theorems are derived based on C,but Lemma 2,det(G;)>0,where det()denotes the determi- they also hold for C with slight changes in the derivations. nant of a matrix..Because det(Gu)-=ΠI-ldet(Gi)≥0, From(3),it is easy to see that the model assumption behind we can conclude that g is convex w.r.t.U. RRMF is in line with the semantics of Type II links.Hence, LethU.(B)U!If we use a RRMF can be expected to achieve good performance for ap- (U1,UT2,...,UTp)T to denote another vector represen- plications with Type II links,such as research paper classifi- tation of U.the Hessian of h w.r.t.a will also be a block- cation,which will be verified by experiment in Section 3. diagonal matrix:Ha=diag H1,H2..Hl, On the other hand,RRMF can also be used for applications where H is defined as follows:H 82h with Type I links.All we need to do is just to preprocess the link structure in the data with a very simple strategy,but the aI+BC.According to Lemma 1,we can conclude that model and learning algorithms need not be changed.Accord- det(Ha)>0,and hence h is convex w.r.t.U. ing to the semantics of Type I links,two entities linking to Hence,f=g+h+Ci is convex w.r.t.U. or linked by one common entity will likely be from the same Theorem 2 f is convex w.r.t.V. class.One simple way to preprocess the link structure is to artificially add a link between two entities if they link to or are Proof:We first rewrite f as follows:f linked by a common entity.The added links will then satisfy C+∑=IV(∑UgU+al)V =- the semantics of Type II links.For example,in Figure 1(a), 2()V.],where Ca is a constant inde- after preprocessing,V2 and V3 will be connected and V6 pendent of V.If we use v=(V1.,V2.,...,Vm)Tminimizing the following objective function: l = 1 2 Xn i=1 Xn j=1 AijkUi∗ − Uj∗k 2 = 1 2 Xn i=1 Xn j=1 " Aij X D d=1 (Uid − Ujd) 2 # = 1 2 X D d=1   Xn i=1 Xn j=1 Aij (Uid − Ujd) 2   = X D d=1 UT ∗dLU∗d = tr(UTLU), (2) where Aij = 1 if there is a relation between entities i and j, and otherwise Aij = 0, and L = D − A is known as the Laplacian matrix [Chung, 1997] with D being a diagonal matrix whose diagonal elements Dii = P j Aij , and tr(·) denotes the trace of a matrix. Combining (1) and (2), we obtain the following objective function which we seek to minimize during learning: f = 1 2 kX − UVT k 2 + α 2 (kUk 2 + kVk 2 ) + β 2 tr(U T LU) = 1 2 kX − UVT k 2 + 1 2 tr[U T (αI + βL)U] + α 2 tr(VVT ), (3) which seamlessly integrates the content information and the relation information into a principled framework. Because we use relation information to regularize the content MF pro￾cedure, we call the model in (3) relation regularized matrix factorization, abbreviated as RRMF. Remark 1 The normalized Laplacian matrix [Chung, 1997], given by L˜ = D−1/2LD−1/2 = I − D−1/2AD−1/2 , can be used to substitute L in (2) and (3) for regulariza￾tion. If L˜ is adopted, the objective function in (2) will be ˜l = 1 2 PD d=1 " Pn i=1 Pn j=1 Aij  √ Uid Dii − √ Ujd Djj 2 # . The choice between L and L˜ depends on applications. In this paper, the lemmas and theorems are derived based on L, but they also hold for L˜ with slight changes in the derivations. From (3), it is easy to see that the model assumption behind RRMF is in line with the semantics of Type II links. Hence, RRMF can be expected to achieve good performance for ap￾plications with Type II links, such as research paper classifi- cation, which will be verified by experiment in Section 3. On the other hand, RRMF can also be used for applications with Type I links. All we need to do is just to preprocess the link structure in the data with a very simple strategy, but the model and learning algorithms need not be changed. Accord￾ing to the semantics of Type I links, two entities linking to or linked by one common entity will likely be from the same class. One simple way to preprocess the link structure is to artificially add a link between two entities if they link to or are linked by a common entity. The added links will then satisfy the semantics of Type II links. For example, in Figure 1(a), after preprocessing, V2 and V3 will be connected and V6 and V7 will also be connected. Learning RRMF based on these added links is now in line with the underlying seman￾tics. From our experiments on the WebKB data set, we find that RRMF can still achieve performance comparable with LCMF just by adopting the simple link preprocessing strat￾egy as described above. 2.3 Learning In this subsection, we first prove that the objective function in (3) is convex with respect to (w.r.t.) any one of its param￾eters, U and V. Then, we propose an alternating projection algorithm to learn the parameters in linear time and show that it is guaranteed to converge to a local optimum. Convexity of the Objective Function Lemma 1 The Laplacian matrix L is psd, i.e., L  0. Proof: For any n × 1 vector x 6= 0, x TLx = 1 2 Pn i=1 Pn j=1 Aij (xi − xj ) 2 ≥ 0.  Lemma 2 Pm j=1 VT j∗Vj∗  0. Proof: For any D×1 vector z 6= 0, z T Pm j=1 VT j∗Vj∗  z = Pm j=1 z T VT j∗Vj∗z = Pm j=1(Vj∗z) 2 ≥ 0.  Theorem 1 f is convex w.r.t. U. Proof: We first rewrite f as follows: f = 1 2 Pn i=1 Pm j=1(Xij − Ui∗VT j∗ ) 2 + 1 2 PD d=1 UT ∗d (αI + βL)U∗d + C1, where C1 is a constant independent of U. Let g = 1 2 Pn i=1 Pm j=1(Xij − Ui∗VT j∗ ) 2 = C2 + 1 2 Pn i=1 h Ui∗ Pm j=1 VT j∗Vj∗  UT i∗ − 2 Pm j=1 XijVj∗  UT i∗ i , where C2 is a constant independent of U. It is easy to see that Gi , ∂ 2 g ∂UT i∗ ∂Ui∗ = Pm j=1 VT j∗Vj∗, and ∂ 2 g ∂UT i∗ ∂Uk∗ = 0 (if i 6= k). If we use u = (U1∗, U2∗, . . . , Un∗) T to denote the vector repre￾sentation of U, the second-order derivative (Hes￾sian) of g w.r.t. u will be a block-diagonal matrix: Gu , ∂ 2 g ∂u∂uT = diag[G1, G2, . . . , Gn]. According to Lemma 2, det(Gi) ≥ 0, where det(·) denotes the determi￾nant of a matrix. Because det(Gu) = Qn i=1 det(Gi) ≥ 0, we can conclude that g is convex w.r.t. U. Let h = 1 2 PD d=1 U∗d(αI + βL)UT ∗d . If we use a = (UT ∗1 , UT ∗2 , . . . , UT ∗D) T to denote another vector represen￾tation of U, the Hessian of h w.r.t. a will also be a block￾diagonal matrix: Ha , ∂ 2h ∂a∂aT = diag[H1, H2, . . . , HD], where Hd is defined as follows: Hd , ∂ 2h ∂U∗d∂UT ∗d = αI + βL. According to Lemma 1, we can conclude that det(Ha) > 0, and hence h is convex w.r.t. U. Hence, f = g + h + C1 is convex w.r.t. U.  Theorem 2 f is convex w.r.t. V. Proof: We first rewrite f as follows: f = C3 + 1 2 Pm j=1[Vj∗ ￾Pn i=1 UT i∗Ui∗ + αI  VT j∗ − 2 (Pn i=1 XijUi∗) VT j∗ ], where C3 is a constant inde￾pendent of V. If we use v = (V1∗, V2∗, . . . , Vm∗) T
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有