That is we can verify the correctness of z:via the following B.Server-Side Process rule z is correct, CRC(z;mod 2)=0, And at the server-side,similar to the routine provided in (9) Section VII.all these recovered summations of update vector z is incorrect,o.w.. chunks,i.e vv(Ba+1).are assembled into the Note that by verifying the checksum with (9),we can guar- averaged compressed update vector C.At,given by antee the correctness of z with high probability.However, some z;can still be erroneous even if it passes the checksum C.A=[,∑∑aa+%1/Ivl verification. V:∈VveV D.Recovery of the Summation of Update Vector Chunks =】 ∑Q(A)/IW1=∑C·sP(g-M)/IWm. Ue∈ v∈V Based on the sourceword-sum z;that is verified to be cor- rect,here we show how to recover 1).Since By recovering sparse vector A:based on algorithms like each element av.j in a is encoded into [bv.jo,..,bv.j convex programming [24]for compressed-sensing,the server in bo according to (3),we have can construct an improved global model M+1=M:+nAt. {∑月,bo,…,∑月,bejc-}={ao…,2je-i}. IX.PERFORMANCE EVALUATION vEV vEV A.Experiment Setup That is each element∑:ev(,ae+w)in∑uev(a,a,+ We conduct experiments using the federated learning al- Y1)can be recovered via gorithm FedSGD presented in [3]to train LeNet-5 [25]for +∑ a L2e+c-」=0, the MNIST digit recognition task,where the training data vEV is shuffled and equally distributed over all clients.During ∑(8a,+) a-26+c each round of FedSGD,all clients are selected,each of 2x一+∑ Yv:0.W., which computes the model update (i.e.,the gradient)based EVi on all its local data (i.e.,batch size equals all its local data) (10) and forward the update to the server.In the experiment, where consistent with 802.11ax,we let U=8 (i.e.,the server e-1 a=(∑4.n2-l-1+∑a.2+mod2+c is equipped with 8 antennas)and study different total client size V in the following two subsections.Specifically,we l=0 1=0 study V=4,8,12,16,denoted by configuration 4 x 8. and e=nlog2(vvB)1.The process presented in (10)can 8 x 8,12 x 8,and 16 x 8,respectively.The channel gain be viewed as transferring each c-bits signed integer in two's- matrix between clients and server is generated according to complement representation (e.g.,[b.j,.,b.)into a the multiple antenna channel model provided in [26].The (e+c)-bits signed integer in two's-complement representation modulation scheme is BPSK.And we use the standard LDPC (e.g.,[bv.jo,bo.jo,...,bv.je,by inserting e extra most codes specified in 802.11 with coding rate 1/2,where k=324 significant bit bv.jo in its beginning),such that the addition and n =648,for error correction.The cyclic-redundancy- of Vil such signed integers won't result in the overflowing checksum adopted in the experiment is CRC24 where the problem. polynomial is 0x800063,the initial value of 24 bits register is VIII.PHYARITH WITH SPARSIFICATION 0x0,and the final XOR value is 0x0.Each quantized update vector chunk is encoded into a sequence of 300 bits.Such 300 In this section,we show how to enable PhyArith in federated bits appended with 24 bits CRC24 form every 324 bits LDPC learning where sparsification is taken to compress client-side codes sourceword. updates and accelerate the communication. A.Client-Side Process B.Outage Ratio of Aggregating Client-Side Updates In conventional sparsification based updates compression We first show the outage ratio of aggregating client-side methods,each client simply sends sparse vector A= updates during training LeNet-5 using FedSGD.Specifically, SP(M?-M:)to the server,where SP()denotes the spar- we show the outage ratio of obtaining the exact summation of sification process.Different from these methods,in order to all update vector chunks transmitted simultaneously,based on enable PhyArith,at the client-side,the update vector PhyArith and the state-of-the-art polynomial time complexity MUD approach GM-IC and LMMSE-IC [20]-[22].From the A=Q(C.SP(M-M:)), experiment results shown in Fig.6,we can see 1)PhyArith is sent from each client to the server,where C is the random always outperforms both MUD approaches;2)PhyArith is underdetermined sensing matrix shared across all clients and especially better when V>U.This is because while the server,used for compressing the sparse vector SP(M- V>U,the decoding process of MUD approaches like M:)based on compressed-sensing.For such A,it can then LMMSE-IC and GM-IC does not necessarily converge [21]. be sliced into chunks,i.e.,A=[...,a,..],and encoded We also take the appended checksums to verify the correct- according to the routine provided in Section VI. ness of all these erroneous summations directly recoveredThat is we can verify the correctness of z ∗ i via the following rule ( z ∗ i is correct, CRC(z ∗ i mod 2) = 0, z ∗ i is incorrect, o.w.. (9) Note that by verifying the checksum with (9), we can guarantee the correctness of z ∗ i with high probability. However, some z ∗ i can still be erroneous even if it passes the checksum verification. D. Recovery of the Summation of Update Vector Chunks Based on the sourceword-sum zi that is verified to be correct, here we show how to recover P v∈Vi (βvav +γv1). Since each element av,j in av is encoded into {bv,j0 , · · · , bv,jc−1 } in bv according to (3), we have { X v∈Vi βˆ vbv,j0 , · · · , X v∈Vi βˆ vbv,jc−1 } = {zi,j0 , · · · , zi,jc−1 }. That is each element P v∈Vi (βvav,j + γv) in P v∈Vi (βvav + γv1) can be recovered via X v∈Vi (βvav,j + γv) .= α 2 ζ + X v∈Vi γv, ⌊ α 2 ǫ+c−1 ⌋ = 0, α − 2 ǫ+c 2 ζ + X v∈Vi γv, o.w., (10) where α = (Xc−1 l=0 zi,jl 2 c−l−1 + Xǫ−1 l=0 zi,j0 2 c+l ) mod 2 ǫ+c , and ǫ = ⌈log2 ( P v∈Vi βˆ v)⌉. The process presented in (10) can be viewed as transferring each c-bits signed integer in two’scomplement representation (e.g., {bv,j0 , · · · , bv,jc−1 }) into a (ǫ+c)-bits signed integer in two’s-complement representation (e.g., {bv,j0 , bv,j0 , · · · , bv,jc−1 }, by inserting ǫ extra most significant bit bv,j0 in its beginning), such that the addition of |Vi | such signed integers won’t result in the overflowing problem. VIII. PHYARITH WITH SPARSIFICATION In this section, we show how to enable PhyArith in federated learning where sparsification is taken to compress client-side updates and accelerate the communication. A. Client-Side Process In conventional sparsification based updates compression methods, each client simply sends sparse vector Av t = SP(Mv t − Mt) to the server, where SP(·) denotes the sparsification process. Different from these methods, in order to enable PhyArith, at the client-side, the update vector Av t = Qv(C · SP(Mv t − Mt)), is sent from each client to the server, where C is the random underdetermined sensing matrix shared across all clients and the server, used for compressing the sparse vector SP(Mv t − Mt) based on compressed-sensing. For such Av t , it can then be sliced into chunks, i.e., Av t = [· · · , av, · · · ], and encoded according to the routine provided in Section VI. B. Server-Side Process And at the server-side, similar to the routine provided in Section VII, all these recovered summations of update vector chunks, i.e., P v∈Vi (βvav + γv1), are assembled into the averaged compressed update vector C · At, given by C · At = h · · · , X Vi∈V X v∈Vi (βvav + γv1)/|V |, · · · i = X v∈V Q −1 v (Av t )/|V | = X v∈V C · SP(Mv t − Mt)/|V |. By recovering sparse vector At based on algorithms like convex programming [24] for compressed-sensing, the server can construct an improved global model Mt+1 = Mt +ηtAt. IX. PERFORMANCE EVALUATION A. Experiment Setup We conduct experiments using the federated learning algorithm FedSGD presented in [3] to train LeNet-5 [25] for the MNIST digit recognition task, where the training data is shuffled and equally distributed over all clients. During each round of FedSGD, all clients are selected, each of which computes the model update (i.e., the gradient) based on all its local data (i.e., batch size equals all its local data) and forward the update to the server. In the experiment, consistent with 802.11ax, we let |U| = 8 (i.e., the server is equipped with 8 antennas) and study different total client size |V | in the following two subsections. Specifically, we study |V | = 4, 8, 12, 16, denoted by configuration 4 × 8, 8 × 8, 12 × 8, and 16 × 8, respectively. The channel gain matrix between clients and server is generated according to the multiple antenna channel model provided in [26]. The modulation scheme is BPSK. And we use the standard LDPC codes specified in 802.11 with coding rate 1/2, where k = 324 and n = 648, for error correction. The cyclic-redundancychecksum adopted in the experiment is CRC24 where the polynomial is 0x800063, the initial value of 24 bits register is 0x0, and the final XOR value is 0x0. Each quantized update vector chunk is encoded into a sequence of 300 bits. Such 300 bits appended with 24 bits CRC24 form every 324 bits LDPC codes sourceword. B. Outage Ratio of Aggregating Client-Side Updates We first show the outage ratio of aggregating client-side updates during training LeNet-5 using FedSGD. Specifically, we show the outage ratio of obtaining the exact summation of all update vector chunks transmitted simultaneously, based on PhyArith and the state-of-the-art polynomial time complexity MUD approach GM-IC and LMMSE-IC [20]–[22]. From the experiment results shown in Fig. 6, we can see 1) PhyArith always outperforms both MUD approaches; 2) PhyArith is especially better when |V | ≥ |U|. This is because while |V | ≥ |U|, the decoding process of MUD approaches like LMMSE-IC and GM-IC does not necessarily converge [21]. We also take the appended checksums to verify the correctness of all these erroneous summations directly recovered