无线互联网 Handout 06 DL for MEC 王晟 博士教授博导 2020年秋季无线互联网 1
DL for MEC 王晟 博士 教授 博导 无线互联网 Handout 06 2020年秋季 无线互联网 1
Motivation#1 本课程一直以来讨论的都是传统/经典的资源分配问题 四“通信”资源:信道、功率、传输决策,等等. 但是:无线边缘网络的发展出现了新趋势. 四原本应用层的处理逻辑“下沉”到边缘网络. Mobile Edge Caching/Computation:MEC. 对用户体验来说是好事:就近获取内容;获取处理能力;减少延迟 ◆对骨干网络来说是好事:核心骨干中的流量(重复请求)减少. 对边缘网络来说,却是挑战:如何管控“新资源”:Cache/.Server. 本单元的目的:通过两类具体的论题来感受一下这些新的 资源分配问题 2020年秋季 2/65 无线互联网
Motivation#1 2020年秋季 2 / 65 无线互联网 本课程一直以来讨论的都是传统/经典的资源分配问题. “通信”资源: 信道、功率、传输决策, 等等. 但是: 无线边缘网络的发展出现了新趋势. 原本应用层的处理逻辑“下沉”到边缘网络. Mobile Edge Caching/Computation: MEC. 对用户体验来说是好事: 就近获取内容; 获取处理能力; 减少延迟. 对骨干网络来说是好事: 核心骨干中的流量(重复请求)减少. 对边缘网络来说, 却是挑战: 如何管控“新资源”: Cache/Server. 本单元的目的: 通过两类具体的论题来感受一下这些新的 资源分配问题
Motivation#2 差不多与MEC的兴起同时,深度学习(DL)兴起. 四传统机器学习的几个分支都受到极大影响: 监督学习和非监督学习中特征工程的困难被克服, 强化学习也因为有了更好的函数近似器而接近实用. ①从效果来看,复杂困难的工程问题似乎一夜之间多了一个 通用的解决思路:DL. MEC中的各种复杂资源管控问题适逢其时. ◆IEEE survey&tutorials、Arxiv上一堆各种综述, 本单元的目的:介绍几个案例,展示DL的应用. 2020年秋季 3/65 无线互联网
Motivation#2 2020年秋季 3 / 65 无线互联网 差不多与MEC的兴起同时, 深度学习(DL)兴起. 传统机器学习的几个分支都受到极大影响: 监督学习和非监督学习中特征工程的困难被克服. 强化学习也因为有了更好的函数近似器而接近实用. 从效果来看, 复杂/困难的工程问题似乎一夜之间多了一个 通用的解决思路: DL. MEC中的各种复杂资源管控问题适逢其时. IEEE survey& tutorials、Arxiv上一堆各种综述. 本单元的目的: 介绍几个案例, 展示DL的应用
案例研究 通过几个具体的案例(paper): 四介绍无线边缘网络中新的资源管控问题. ©引入DL来解决/辅助解决相关问题的思路: 案例研究的特点: 挂一漏万:不可能涵盖MEC,DL中的所有论题, 假定有一定的相关基础. 四管中窥豹:希望所选案例具有代表性 》但选择依据只是个人观点,反应的是我的无知 2020年秋季 4/65 无线互联网
案例研究 2020年秋季 4 / 65 无线互联网 通过几个具体的案例(paper): 介绍无线边缘网络中新的资源管控问题. 引入DL来解决/辅助解决相关问题的思路. 案例研究的特点: 挂一漏万: 不可能涵盖MEC, DL中的所有论题. è 假定有一定的相关基础. 管中窥豹: 希望所选案例具有代表性. è但选择依据只是个人观点, 反应的是我的无知
CASE#1 [Rathore2019]Shailendra Rathore,Jung Hyun Ryu,Pradip Kumar Sharma, and Jong Hyuk Park,"DeepCachNet:A Proactive Caching Framework Based on Deep Learning in Cellular Networks",/EEE Network. May/June 2019,pp130-138 Proactive Caching 四预测有可能请求的内容(网页、视频等); 四缓存于基站旁的服务器上. 本地用户请求内容时,由cache.直接响应:减少延迟, ◆若用户反复请求同一内容(足够流行),Cache可以减少骨干网流量, ?核心问题:如何决定缓存哪些内容? 2020年秋季 5/65 无线互联网
CASE#1 2020年秋季 5 / 65 无线互联网 [Rathore2019]Shailendra Rathore, Jung Hyun Ryu, Pradip Kumar Sharma, and Jong Hyuk Park, “DeepCachNet: A Proactive Caching Framework Based on Deep Learning in Cellular Networks”, IEEE Network • May/June 2019, pp130-138 Proactive Caching 预测有可能请求的内容(网页、视频等); 缓存于基站旁的服务器上. 本地用户请求内容时, 由cache直接响应: 减少延迟. 若用户反复请求同一内容(足够流行), Cache可以减少骨干网流量. 核心问题: 如何决定缓存哪些内容?
内容的流行度 ◇显而易见的答案:缓存最流行的内容 ②不那么显然的问题:什么叫流行? 四考虑这样的方案:统计全球最受欢迎的内容,推送到所有基站缓存 ◆不一定体现“本地”用户的喜好! ©通常的做法是:建立本地用户的流行度矩阵」 记录/预测每个用户对每个内容的偏爱程度: 需要部署内容到基站缓存时,就从该矩阵中统计“最流行”的内容: 自然的问题:流行度矩阵中的偏爱程度哪里来? 2020年秋季 6/65 无线互联网
内容的流行度 2020年秋季 6 / 65 无线互联网 显而易见的答案: 缓存最流行的内容. 不那么显然的问题: 什么叫流行? 考虑这样的方案: 统计全球最受欢迎的内容, 推送到所有基站缓存. 不一定体现“本地”用户的喜好. 通常的做法是: 建立本地用户的流行度矩阵. 记录/预测每个用户对每个内容的偏爱程度. 需要部署内容到基站缓存时, 就从该矩阵中统计“最流行”的内容. 自然的问题: 流行度矩阵中的偏爱程度哪里来?
合作滤波 最直观的方案:统计/计数 四某用户请求某内容,则该用户对该内容的偏爱程度加1. 第一个问题:Cold Start. 即:本基站的新用户将没有任何记录 第二个问题:流行度矩阵过于稀疏 即:对没有请求过的内容,没有任何记录, 解决方案:合作滤波[推荐系统] 四通过“用户之间的相似度”和“内容之间的相似度”来预测 ◆用“相似用户”的偏爱来预测新用户的偏爱程度, ◆用“相似内容”的偏爱来预测新内容的偏爱程度 如何定量考察“相似度”? 2020年秋季 7/65 无线互联网
合作滤波 2020年秋季 7 / 65 无线互联网 最直观的方案: 统计/计数 某用户请求某内容, 则该用户对该内容的偏爱程度加1. 第一个问题: Cold Start. 即: 本基站的新用户将没有任何记录. 第二个问题: 流行度矩阵过于稀疏. 即: 对没有请求过的内容, 没有任何记录. 解决方案: 合作滤波 [推荐系统] 通过“用户之间的相似度”和“内容之间的相似度”来预测. 用“相似用户”的偏爱来预测新用户的偏爱程度. 用“相似内容”的偏爱来预测新内容的偏爱程度. 如何定量考察“相似度”?
特征学习 User Feature Data Domain Content Feature Data Domain Type Type Age Numeric (25) Content_Type Nominal TV,Movie etc. Gender Nominal Genre Nominal Comedy,Action etc. Personality Nominal Openness etc. Writer Nominal Weather Condition Nominal Sunny,Cloudy etc. Director Nominal Type of device Nominal Tablet,Smart phone etc. Vote Numeric e5) battery status Nominal Published on Year YYYY Physical activity of the user Nominal In vehicle.On foot etc. Number of Numeric E0) Dislike Ringer mode of the device Nominal Number of Views Numeric 20) Current location of the Numeric device values) Time of day Time Hour Language Nominal Spanish,English etc. Wi-Fi network Nominal Country Nominal America etc. Cast Nominal Keyword Nominal Student etc. (a) User_ID Content ID Popularity User Feature Content Feature 2 0 0.4 (45,Male,Extrovert,Cloudy,Smart phone...) (TV,Action,Martine,Alice,6,2010...) 、2 0.2 (26,Female,Openness,Sunny,Computer...) (Drama,Comedy,James,Andrew,14,2016...) 0 2 0.5 (30,Female,Extrovert,Sunny,Computer...) (Drama,Comedy,James,Andrew,14,2016...) 1 0.2 (67,Male,Introvert,Cloudy,Laptop...) (Movie,Action,Sunil,Picasso,50,2018...) 3 1 0.7 (55,Female,Extrovert,Cloudy,Tablet... (Movie,Action,Sunil,Picasso,50,2018...) (b) 四特征矢量的余弦相似度是通用的做法.但不加选择的特征收集效果不佳 ◆这正是DL发挥作用的地方:特征学习特征降维/特征提取 2020年秋季 8/65 无线互联网
特征学习 2020年秋季 8 / 65 无线互联网 IEEE Network • May/June 2019 133 work structure [6]. Figure 3a shows an auto-encoder network structure that has three layers: an input layer, a hidden layer, and an output layer. The hidden layer attempts to learn a function Fw,b(a) = a, where ^ w and b are the weights of the network edges connecting input a to hidden layer T2. In other words, each input instance a is trained by using the identity function F so as to make the output similar to input a. By putting some constraints on the network, such as limiting the number of hidden units, we can determine the correlations between the diverse features of input instance a, and extract them in low-dimensional representation [6]. The network uses a sigmoid activation function as an identity function, where the hidden unit is “active” if the function output is close to 1, or “inactive” if the function output is close to 0. For a given dataset of users A = {a1, a2, a3, …, ai }, where each user ai is represented as r-dimensions and consists of r features extracted from the raw data. We describe the auto-encoder method in the following steps. Step 1: All instances in the dataset A are normalized. In the normalization process, all numeric values are normalized to {0…1} and the nominal values are converted to a set of binary indicators. After normalization, a normalized dataset A’ is obtained, where each instance in A’ is r’-dimensional, and each value is a numeric variable within the range of {0…1}. Step 2: The normalized dataset A’ is delivered to the auto-encoder, which performs training by employing the activation function F and l hidden units over the normalized dataset A’ and then returns a trained auto-encoder model. Step 3: A weight matrix Wlr’ is retrieved from the trained auto-encoder model. Here, each entry wij in the matrix denotes the weight of the connecting edge from the j th input node to the i th hidden node. Step 4: Each sample a’ in the normalized dataset A’ is multiplied by the transpose of the weight matrix W(a’WT) and the activation function F is applied to each element of the resultant multiplication. This step gives the output dataset Z. Finally, the dataset Z contains the l-dimensional extracted features of the users connected to SBSs. Content Features: The conventional collaborative filtering approach is not capable of determining the popularity rating for cold start content due the limited auxiliary information available about the content. To address this problem, a deep neural network called Stacked Denoising Autoencoders (SDAE) is used, whose main aim is to enable automatic extraction of higher-level and more abstract features from the auxiliary information of the content and defined the features in lower level dimensions. In the feature extraction process, auxiliary content descriptions for all of the content in the collected raw data are acquired. The raw content description for all content is processed and the bag of words method is applied to generate a vector for all of the content. All of the content and its associated vectors are then supplied to the SDAE and the content features are obtained. In the proposed framework, we used an existing network structure of SDAE [14]. Figure 3b shows the graphic structure of the SDAE, where the raw input set c0 of associated vectors with all of the content is first converted into noise-corruptFIGURE 2. Structure of collected data: a) representation of user and content features; b) resultant feature-based content popularity matrix. User Feature Data Type Domain Content Feature Data Type Domain Age Numeric ( 5) Content_Type Nominal TV, Movie etc. Gender Nominal Genre Nominal Comedy, Action etc. Personality Nominal Openness etc. Writer Nominal Weather Condition Nominal Sunny, Cloudy etc. Director Nominal Type of device Nominal Tablet, Smart phone etc. Vote Numeric ( 5) battery status Nominal Published on Year YYYY Physical activity of the user Nominal In vehicle, On foot etc. Number of Dislike Numeric ( 0) Ringer mode of the device Nominal Number of Views Numeric ( 0) Current location of the device Numeric Company Nominal Time of day Time Hour Language Nominal Spanish, English etc. Wi-Fi network Nominal Country Nominal America etc. . . . Cast Nominal . Keyword Nominal Student etc. . . . . . . . . . . . . . . . . (a) User_ID Content_ID Popularity User Feature Content Feature 2 0 0.4 (45, Male, Extrovert, Cloudy, Smart phone …) (TV, Action, Martine, Alice, 6, 2010…) 4 2 0.2 (26, Female, Openness, Sunny, Computer…) (Drama, Comedy, James, Andrew, 14, 2016…) 0 2 0.5 (30, Female, Extrovert, Sunny, Computer…) (Drama, Comedy, James, Andrew, 14, 2016…) 1 1 0.2 (67, Male, Introvert, Cloudy, Laptop…) (Movie, Action, Sunil, Picasso, 50, 2018…) 3 1 0.7 (55, Female, Extrovert, Cloudy, Tablet …) (Movie, Action, Sunil, Picasso, 50, 2018…) . . . . . . . . . . . . . . . (b) Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on July 01,2020 at 04:24:21 UTC from IEEE Xplore. Restrictions apply. 特征矢量的余弦相似度是通用的做法. 但不加选择的特征收集效果不佳. 这正是DL发挥作用的地方: 特征学习/特征降维/特征提取
AutoEncoder AE结构: a Fw.b(a) 四三层:输入;隐藏;输出;权重;激活函数(Sigmoid) az ◇训练目标: a3 四设置合适的权重,使得输出a与输入α尽量一样. +1 Hidden layer T2 训练方法: as as Output layer T3 四将数据集中的特征矢量逐个输入AE. Input layer T 四根据当前权重计算输出[正向传播]; ①σ(W1,就是编码函数 四评估输出与输入的“差距”; ©根据差距来调整权重[反向传播;梯度下降] 只要隐藏层单元数目 少于输入层,就能实现 训练收敛后,得到两套权重:W1和W2 “降维”编码. 2020年秋季 9/65 无线互联网
AutoEncoder 2020年秋季 9 / 65 无线互联网 134 IEEE Network • May/June 2019 ed input c1 using an artificial noise injection process to learn a higher-level feature representation in SDAE. The hidden layer h1 further maps the corrupted input c1 to hidden representation c2 through function Fw1,b1, where w1 and b1 are the weights of the network edges connecting input layer X to hidden layer h1. Typically, the hidden layer attempts to reconstruct its output c2 close to input c1 with a higher-level feature representation, which is known as first Denoising Autoencoding (DAE). After finishing first DAE, the hidden representation c2 of first DAE is treated as the input for the next DAE, which results in second hidden layer h2 representation. Thus, the procedure of stacking multiple DAEs in SDAE is repeated until the following optimization problem is solved: min {wl },{bl } c0 cL 2 + wl l where L and l are the total number of layers and hidden layers, respectively, in SDAE. The output of the layer L is represented by cL. The weight matrix and bias vector of the layer l are denoted by wl and bl , respectively. b is a regularization parameter. Once the trained model of the SDAE has been obtained, its hidden layer provides the content features of all content. Finally, all content has an associated vector of its features. User-Content Interaction: Provides the collection of “user-content pairs” which is carried out by using the standard process followed by the existing caching process [2]. The “user-content pair” defines the popularity (rating) of a specific content to a specific user. FEATURE-BASED CONTENT POPULARITY MATRIX The content popularity matrix of conventional collaborative filtering reflects only the user-content interaction, without being concerned with the features of the users and contents that are responsible for the interaction. Our proposed framework adds the features of users and content with each entry (rating) in the content popularity matrix, which describes under which condition the interaction between user and content takes place. The resultant matrix is called the feature-based content popularity matrix, as shown in Fig. 2b. The structure of a single entry in the feature-based content popularity matrix is defined as Px,h, (u1, u2, …, uj ), (c1, c2, … , ck), where P is the popularity or rating of the content h £ H for the user x £ X in consideration of the user feature set (u1, u2, …, uj ) and the content feature set (c1, c2, …, ck). ESTIMATION ALGORITHM This is widely used in the proactive caching system to predict unknown entries in the content popularity matrix. Collaborative filtering [4] is one of the most popular methods of estimating the content popularity matrix, and several variants of it, such as SVDFeature [15], have been presented in a specific setting. SVDFeature is a features-based collaborative filtering method that is designed to estimate the feature-based content popularity matrix. It enables the prediction of unknown entries in the content popularity matrix by incorporating side information through feature engineering. It can handle a large training dataset and is capable of collaborative ranking and rate prediction, as well as reducing the engineering effort required for proactive caching. Hence, we use the SVDFeature method as an estimation algorithm in our proposed framework. CASE STUDY OF THE PROPOSED FRAMEWORK To validate the effectiveness of the proposed framework, we present a case study. In what follows, as a case study, we first collected a large amount of raw data from users’ mobile devices connected to SBSs. The relevant features were extracted from the raw data by using DL, and the extracted features were used to generate the feaFIGURE 3. Structure of neural network for the extraction of user and content features: a) an auto-encoder network structure for extracting the user features; b) a graphic structure of the SDAE for extracting the content features. (a) (b) Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on July 01,2020 at 04:24:21 UTC from IEEE Xplore. Restrictions apply. AE结构: 三层: 输入; 隐藏; 输出; 权重; 激活函数(Sigmoid) 训练目标: 设置合适的权重, 使得输出�!与输入�尽量一样. 训练方法: 将数据集中的特征矢量逐个输入AE. 根据当前权重计算输出[正向传播]; 评估输出与输入的“差距”; 根据差距来调整权重[反向传播;梯度下降] 训练收敛后, 得到两套权重: ��和�� � ��, � 就是编码函数 只要隐藏层单元数目 少于输入层, 就能实现 “降维”编码.
SDAE Stacked Denoising AutoEncoder 四原理与AE相似; Raw Corrupted Hidden clean Reconstruction input representation input input output Fu.B(ci) Fw2.b:(c:) F,b(c司 四但隐藏层扩展为多层; X 四输入层引入噪声; 效果上,能更有效 h Input layer(X) Hidden layers 地提取抽象特征. Output layer Number of Layer in SDAE=L 本文用AE对用户特征矢量降维用SDAE对内容特征降维 2020年秋季 10/65 无线互联网
SDAE 2020年秋季 10 / 65 无线互联网 134 IEEE Network • May/June 2019 ed input c1 using an artificial noise injection process to learn a higher-level feature representation in SDAE. The hidden layer h1 further maps the corrupted input c1 to hidden representation c2 through function Fw1,b1, where w1 and b1 are the weights of the network edges connecting input layer X to hidden layer h1. Typically, the hidden layer attempts to reconstruct its output c2 close to input c1 with a higher-level feature representation, which is known as first Denoising Autoencoding (DAE). After finishing first DAE, the hidden representation c2 of first DAE is treated as the input for the next DAE, which results in second hidden layer h2 representation. Thus, the procedure of stacking multiple DAEs in SDAE is repeated until the following optimization problem is solved: min {wl },{bl } c0 cL 2 + wl l where L and l are the total number of layers and hidden layers, respectively, in SDAE. The output of the layer L is represented by cL. The weight matrix and bias vector of the layer l are denoted by wl and bl , respectively. b is a regularization parameter. Once the trained model of the SDAE has been obtained, its hidden layer provides the content features of all content. Finally, all content has an associated vector of its features. User-Content Interaction: Provides the collection of “user-content pairs” which is carried out by using the standard process followed by the existing caching process [2]. The “user-content pair” defines the popularity (rating) of a specific content to a specific user. FEATURE-BASED CONTENT POPULARITY MATRIX The content popularity matrix of conventional collaborative filtering reflects only the user-content interaction, without being concerned with the features of the users and contents that are responsible for the interaction. Our proposed framework adds the features of users and content with each entry (rating) in the content popularity matrix, which describes under which condition the interaction between user and content takes place. The resultant matrix is called the feature-based content popularity matrix, as shown in Fig. 2b. The structure of a single entry in the feature-based content popularity matrix is defined as Px,h, (u1, u2, …, uj ), (c1, c2, … , ck), where P is the popularity or rating of the content h £ H for the user x £ X in consideration of the user feature set (u1, u2, …, uj ) and the content feature set (c1, c2, …, ck). ESTIMATION ALGORITHM This is widely used in the proactive caching system to predict unknown entries in the content popularity matrix. Collaborative filtering [4] is one of the most popular methods of estimating the content popularity matrix, and several variants of it, such as SVDFeature [15], have been presented in a specific setting. SVDFeature is a features-based collaborative filtering method that is designed to estimate the feature-based content popularity matrix. It enables the prediction of unknown entries in the content popularity matrix by incorporating side information through feature engineering. It can handle a large training dataset and is capable of collaborative ranking and rate prediction, as well as reducing the engineering effort required for proactive caching. Hence, we use the SVDFeature method as an estimation algorithm in our proposed framework. CASE STUDY OF THE PROPOSED FRAMEWORK To validate the effectiveness of the proposed framework, we present a case study. In what follows, as a case study, we first collected a large amount of raw data from users’ mobile devices connected to SBSs. The relevant features were extracted from the raw data by using DL, and the extracted features were used to generate the feaFIGURE 3. Structure of neural network for the extraction of user and content features: a) an auto-encoder network structure for extracting the user features; b) a graphic structure of the SDAE for extracting the content features. (a) (b) Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on July 01,2020 at 04:24:21 UTC from IEEE Xplore. Restrictions apply. Stacked Denoising AutoEncoder 原理与AE相似; 但隐藏层扩展为多层; 输入层引入噪声; 效果上, 能更有效 地提取抽象特征. 本文用AE对用户特征矢量降维; 用SDAE对内容特征降维.