正在加载图片...
2)Syntax Features. functions and class names of approximately-sensitive We define the syntax feature Si represented by Really- APIs.Taking System.loadlibrary()in Figure 1 as an essential Permission,including the following two types: example,LoadLibrary is its approximate permission. (1)Permissions of Sensitive APIs A. We extract the really-essential permission used in C.Training Learning-based Classifier Module programs to form the syntax feature because For the sake of classifying Android applications as the permissions requested in Androidmanifest.xml are malicious and the benign,we formulate the malware not all used by applications,which may confuse the detection as a classification problem.With the feature set detecting process.First,we summarize the results of generated from the feature extraction module,we resort to PScout under five versions of Android,which are the tool Weka [14]and try a number of classical machine match between each permission (including normal learning techniques to train the optimal classification model, permissions)and APIs,with 24147 matching results including C4.5 Decision Tree,Random Forest,Support aggregately.Then we can collect all the really- Vector Machine,K-Nearest Neighbor and Naive Bayes. essential permissions of all the reachable sensitive APIs in applications. Taking IV.EXPERIMENTAL EVALUATIONS SmsManager.sendTextMessage()in Figure 1 as an We conduct our experiments on a machine with Intel example,its permission is SEND SMS. Core i5-4460 3.20GHz CPU.16.0GB RAM and Windows 10 operating system.Based on FlowDroid,PScout and SUSI, Algorithm 1:Extracting Semantic Feature Set we utilize Java with JDK 8.0 and Eclipse to extract features. Input:APK a We use 1 to represent the feature exists and use 0 to Generalized-sensitive API Set G=AUA represent the opposite.After obtaining the feature matrix,we UI-related Tigger Point Set U put it into Weka to train the optimal classification model.We Output:Entry Point Set T utilize 10-fold cross validation to train RepassDroid and Semantic Feature Set S(T-G) evaluate effectiveness of it and the feature set. We have conducted three groups of experiments and will 1 CG cg =(N,E)+FlowDroid parse a: discuss the following research questions: 2 foreach nare→ndest∈E,nsre,ndest∈Ndo 3 if nsre.srcNode=null nare DummyMain then RQ1:How effective is RepassDroid in classifying 4 T.addNode(nsre); Android applications and detecting malware?Which nsre.Dest Node.addNode(ndest): machine learning technique is the best for our feature set? while ndest LeafNode do RQ2:How do the generalized-sensitive API and feature set contribute to the effectiveness of malware detection? foreach ndest-ndestEE,ndest,ndestEN do nsre.Dest Node.addNode(ndest); What about the representation of the feature set? RQ3:Compared with the Android malware detecting 9 end tool Drebin,how is RepassDroid?How does it compare with 10 ndest=ndest'; the detecting tools on the website VirusTotal? 11 end end A. Dataset and Evaluative Criteria 13 end Our dataset contains 24288 applications (the samples i4 foreach nsre∈Tdo whose parsing time is more than 10 minutes have been 15 foreach g∈nsrc,DestNode&g∈Gdo removed),20000 for training and 4288 for test.Benign apps if nsre∈U then and malicious apps are each half of the training set.We sfT→G.addFeature(UIevent→g: analyzed 12086 Google Play applications to constitute the 18 else benign sample from the website AndroZoo [19]and 12202 19 SfT→cy.addFeature(NonUIevent→gi malicious applications to construct the malicious sample 20 end from Android Malgenome Project [20],VirusShare [21]and 21 end Drebin[11].The average time for analyzing an application is 22 end about 60 seconds,and the specific analysis time is decided 23 return S(T-G): by the size of application. In summary,the feature set S collected from sample (2)Permissions of Approximately-sensitive APIs A. APKs has totally 871 features,including 811 semantic Since we have defined two types of features and 60 syntax features (including 46 sensitive approximately sensitive APIs but there are no permissions and 14 approximate permissions). corresponding permissions in Android system,we need to define Approximate Permission for them to keep the comprehensiveness of analysis.We did not S7→G s(P) use categories specified in SUSI due to its incomplete 811 60 results.Otherwise,we count and define the In the experiments,the evaluative criteria we employed approximate permission manually based on the are as follows: 552) Syntax Features: We define the syntax feature S{P} represented by Really￾essential Permission, including the following two types: (1) Permissions of Sensitive APIs A. We extract the really-essential permission used in programs to form the syntax feature because permissions requested in Androidmanifest.xml are not all used by applications, which may confuse the detecting process. First, we summarize the results of PScout under five versions of Android, which are the match between each permission (including normal permissions) and APIs, with 24147 matching results aggregately. Then we can collect all the really￾essential permissions of all the reachable sensitive APIs in applications. Taking SmsManager.sendTextMessage() in Figure 1 as an example, its permission is SEND_SMS. (2) Permissions of Approximately-sensitive APIs ~ A . Since we have defined two types of approximately sensitive APIs but there are no ٝ corresponding permissions in Android system, we need to define Approximate Permission for them to keep the comprehensiveness of analysis. We did not use categories specified in SUSI due to its incomplete results. Otherwise, we count and define the approximate permission manually based on the functions and class names of approximately-sensitive APIs. Taking System.loadlibrary() in Figure 1 as an example, LoadLibrary is its approximate permission. C. Training Learning-based Classifier Module For the sake of classifying Android applications as the malicious and the benign, we formulate the malware detection as a classification problem. With the feature set generated from the feature extraction module, we resort to tool Weka [14] and try a number of classical machine learning techniques to train the optimal classification model, including C4.5 Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbor and Naive Bayes. IV. EXPERIMENTAL EVALUATIONS We conduct our experiments on a machine with Intel Core i5-4460 3.20GHz CPU, 16.0GB RAM and Windows 10 operating system. Based on FlowDroid, PScout and SUSI, we utilize Java with JDK 8.0 and Eclipse to extract features. We use 1 to represent the feature exists and use 0 to represent the opposite. After obtaining the feature matrix, we put it into Weka to train the optimal classification model. We utilize 10-fold cross validation to train RepassDroid and evaluate effectiveness of it and the feature set. We have conducted three groups of experiments and will discuss the following research questions: RQ1: How effective is RepassDroid in classifying Android applications and detecting malware? Which machine learning technique is the best for our feature set? RQ2: How do the generalized-sensitive API and feature set contribute to the effectiveness of malware detection? What about the representation of the feature set? RQ3: Compared with the Android malware detecting tool Drebin, how is RepassDroid? How does it compare with the detecting tools on the website VirusTotal? A. Dataset and Evaluative Criteria Our dataset contains 24288 applications (the samples whose parsing time is more than 10 minutes have been removed), 20000 for training and 4288 for test. Benign apps and malicious apps are each half of the training set. We analyzed 12086 Google Play applications to constitute the benign sample from the website AndroZoo [19] and 12202 malicious applications to construct the malicious sample from Android Malgenome Project [20], VirusShare [21] and Drebin[11]. The average time for analyzing an application is about 60 seconds, and the specific analysis time is decided by the size of application. In summary, the feature set S collected from sample APKs has totally 871 features, including 811 semantic features and 60 syntax features (including 46 sensitive permissions and 14 approximate permissions). In the experiments, the evaluative criteria we employed are as follows: 55
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有