where Q (k) = 4Q(k) , p (k) i = 2[p (_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Column Sampling based Discrete Supervised Hashing

正在加载图片...

where可=4Q,)-2n-∑把l(Q+Q]: The Whole Algorithm The whole learning(optimization) More details about the above derivation can be found in the algorithm is summarized in Algorithm 1. supplementary material. Furthermore,BOP can be turned into an equivalent form Algorithm 1 Discrete optimization in COSDISH without linear terms(Yang 2013).That is to say,the problem in(5)can be rewritten as follows: Input:SE{-1,1)nxn,q.Tsto.Taut. Initialize binary code B by randomization min [61TO)6 for iter=l→Tsto do (6) Sample columns of S to get S. s.t. 6e{0,1}+1,a+1=1 Let be the set of sampled column indices and I =A-, where with N={1,2,...n} Split S into S?and Sh 6= Q(k) Split B into B and B) 0 fort=1→Taut do fork=1→gdo A method proposed by (Yang 2013)can solve the problem in(6)with an additional constraint=[( Construct problem (4)from B and the 1)/27.We also add this constraint to our problem to get a first k-1 columns of B). Construct problem(5)from problem (4) balanced result for each bit which has been widely used in Construct problem(7)from problems (5)and (6). hashing (Liu et al.2014).So we reformulate problem (6) Construct problem(8)from problem(7)by performing with the constraint as follows: Cholesky decomposition. min [61TO6 Using the 2-approximation algorithm (Yang 2013)to solve problem (8)and acquire the kth column of B s.t. b∈{0,1M,的u=1 end for M (7) Bro=e-sgn(SB),B(t-1)). 丁被=H end for Recover B by combining B and B i=1 end for where M=+1,and H=[M/2]. 0 utput:B∈{-1,1}nxg As in (Yang 2013),we transform the problem in(7)to an equivalent clustering problem:given a dataset ={ui E M)we want to find a subset u of size H that the sum in general,.10≤Tsto≤20,3≤Talt≤10and|2≥q of square of the distances within the subset is minimized is enough to get satisfactory performance.Unless otherwise It can be formulated as: stated,we set Tsto 10,Talt 3 and g in our 1v2 experiments.Furthermore,in our experiments we find that min our algorithm is not sensitive to the initialization.Hence,we u∈l4" vEW (8) adopt random initialization in this paper. s.t.U'cu,'=H,uv eu' Remark 1.Unlike other hashing methods such as Then,we have the following theorems. TSH(Lin et al.2013a)which try to solve the whole problem Theorem 2.Let us use a matrix U of size Mx M to denote as q BOP problems,our alternating optimization strategy the datesetu with U=ui,and if UTU=XI-Q(k)>0. decomposes the whole problem into two sub-problems. then (7)and (8)are equivalent. Typically,.i.e.the number of variables in B is far less than that in B.Furthermore,the cost to get a Proof.We can use similar method in (Yang 2013)to prove solution for problem(3)is much lower than that to get a ▣ solution for BOP.Hence,the key idea of our alternating According to (Yang 2013),we can always find such U optimization strategy is to adopt a faster solution for the and A to satisfy AI-Q(k)0 in Theorem 2.In practice, larger sub-problem,which makes our strategy much faster we can take a sufficiently large number as A,and perform than TSH.Moreover,the faster solution of our strategy Cholesky decomposition on AI-Q(k)to get U. can also guarantee accuracy,which will be verified in the experiments. Theorem 3.Assuming u!is the global solution of (8) TSH adopts LBFGSB to solve the BOP problem in a and f(u)=∑weu llu-a∑vevl2 is the objective continuous-relaxation way.We found that if LBFGSB is used function of(8),there exists an algorithm which can find a to solve the BOP problem in our method (i.e.,problem (6)). solution W where the accuracy of our method will be dramatically deteriorat- f(41)≤2f(4) ed.The reason is that the the solution of LBFGSB will hurt That is to say,there exists a 2-approximation algorithm the quality of the solution of Br. for (8). In addition,the graph-cut method used in FastH cannot be adapted to solve our problem (6),because problem (6) Proof.Please refer to (Yang 2013)for the proof and algo- doesn't satisfy the sub-modular property required by the rithm. graph-cut method.where Q (k) = 4Q(k) , p (k) i = 2[p (k) i − P|Ω| l=1(Q (k) i,l +Q (k) l,i )]. More details about the above derivation can be found in the supplementary material. Furthermore, BQP can be turned into an equivalent form without linear terms (Yang 2013). That is to say, the problem in (5) can be rewritten as follows: min [bek ] T Qe (k)bek s.t. bek ∈ {0, 1} |Ω|+1 , eb k |Ω|+1 = 1 (6) where bek = b k 1 , Qe (k) = Q (k) 1 2 p (k) 1 2 [p (k) ] T 0 ! . A method proposed by (Yang 2013) can solve the problem in (6) with an additional constraint P|Ω|+1 i=1 eb k i = d(|Ω| + 1)/2e. We also add this constraint to our problem to get a balanced result for each bit which has been widely used in hashing (Liu et al. 2014). So we reformulate problem (6) with the constraint as follows: min [bek ] T Qe (k)bek s.t. bek ∈ {0, 1} M, eb k M = 1 X M i=1 eb k i = H (7) where M = |Ω| + 1, and H = dM/2e. As in (Yang 2013), we transform the problem in (7) to an equivalent clustering problem: given a dataset U = {ui ∈ RM}M i=1, we want to find a subset U 0 of size H that the sum of square of the distances within the subset U 0 is minimized. It can be formulated as: min X u∈U0 ku − 1 H X v∈U0 vk 2 s.t. U 0 ⊆ U, |U0 | = H, uM ∈ U0 (8) Then, we have the following theorems. Theorem 2. Let us use a matrix U of size M ×M to denote the dateset U with U∗i = ui , and if UT U = λI−Qe (k) 0, then (7) and (8) are equivalent. Proof. We can use similar method in (Yang 2013) to prove it. According to (Yang 2013), we can always find such U and λ to satisfy λI − Qe (k) 0 in Theorem 2. In practice, we can take a sufficiently large number as λ, and perform Cholesky decomposition on λI − Qe (k) to get U. Theorem 3. Assuming U 0 ∗ is the global solution of (8) and f(U 0 ) = P u∈U0 ku − 1 H P v∈U0 vk 2 is the objective function of (8), there exists an algorithm which can find a solution U 0 1 where f(U 0 1 ) ≤ 2f(U 0 ∗ ). That is to say, there exists a 2-approximation algorithm for (8). Proof. Please refer to (Yang 2013) for the proof and algorithm. The Whole Algorithm The whole learning (optimization) algorithm is summarized in Algorithm 1. Algorithm 1 Discrete optimization in COSDISH Input: S ∈ {−1, 1} n×n , q, Tsto, Talt, |Ω| Initialize binary code B by randomization for iter = 1 → Tsto do Sample |Ω| columns of S to get Se. Let Ω be the set of sampled column indices and Γ = N − Ω, with N = {1, 2, . . . n}. Split Se into SeΩ and SeΓ . Split B into B Ω (0) and B Γ (0) . for t = 1 → Talt do for k = 1 → q do Construct problem (4) from B Γ (t−1) , SeΩ , SeΓ and the first k − 1 columns of B Ω (t) . Construct problem (5) from problem (4). Construct problem (7) from problems (5) and (6). Construct problem (8) from problem (7) by performing Cholesky decomposition. Using the 2-approximation algorithm (Yang 2013) to solve problem (8) and acquire the kth column of B Ω (t) . end for B Γ (t) = e sgn(SeΓB Ω (t) , B Γ (t−1)). end for Recover B by combining B Ω (t) and B Γ (t) . end for Output: B ∈ {−1, 1} n×q In general, 10 ≤ Tsto ≤ 20 , 3 ≤ Talt ≤ 10 and |Ω| ≥ q is enough to get satisfactory performance. Unless otherwise stated, we set Tsto = 10, Talt = 3 and |Ω| = q in our experiments. Furthermore, in our experiments we find that our algorithm is not sensitive to the initialization. Hence, we adopt random initialization in this paper. Remark 1. Unlike other hashing methods such as TSH (Lin et al. 2013a) which try to solve the whole problem as q BQP problems, our alternating optimization strategy decomposes the whole problem into two sub-problems. Typically, |Ω| |Γ|, i.e., the number of variables in BΩ is far less than that in BΓ. Furthermore, the cost to get a solution for problem (3) is much lower than that to get a solution for BQP. Hence, the key idea of our alternating optimization strategy is to adopt a faster solution for the larger sub-problem, which makes our strategy much faster than TSH. Moreover, the faster solution of our strategy can also guarantee accuracy, which will be verified in the experiments. TSH adopts LBFGSB to solve the BQP problem in a continuous-relaxation way. We found that if LBFGSB is used to solve the BQP problem in our method (i.e., problem (6)), the accuracy of our method will be dramatically deteriorated. The reason is that the the solution of LBFGSB will hurt the quality of the solution of BΓ. In addition, the graph-cut method used in FastH cannot be adapted to solve our problem (6), because problem (6) doesn’t satisfy the sub-modular property required by the graph-cut method

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Column Sampling based Discrete Supervised Hashing