Malware and Artificial Immune Systems Chris Musselle Bristol Centre for Complexity Sciences(BCCS) University of Bristol Supervised by Dave Cliff and Ayalvadi Ganesh Nottingham University 2010 04/10/2010 Presentation
Malware and Artificial Immune Systems Chris Musselle Bristol Centre for Complexity Sciences (BCCS) University of Bristol Supervised by Dave Cliff and Ayalvadi Ganesh 04/10/2010 Nottingham University 2010 Presentation
Malware Evolution >Pre 1990-Experimental /intellectual pranks.E.g.Morris Worm. >1990-1999-More sophisticated Viruses and Worms e.g Macro virus,encryption,polymorphic viruses. >2000-2003-Explosion of Worms.CodeRed,Nimda,Slammer etc... >2003-present-Increase in malware sophistication,blended threats,countermeasures,updating.e.g.Conficker. >Shift in motive towards financial gain has driven the increased sophistication and prevalence of malware. The Web today provides cyber-criminals with the targets, exploitable weaknesses,and anonymity required for large- scale fraud
Malware Evolution Pre 1990 – Experimental /intellectual pranks. E.g. Morris Worm. 1990-1999 – More sophisticated Viruses and Worms e.g. Macro virus, encryption, polymorphic viruses. 2000-2003 – Explosion of Worms. CodeRed, Nimda, Slammer etc... 2003-present – Increase in malware sophistication, blended threats, countermeasures, updating. e.g. Conficker. Shift in motive towards financial gain has driven the increased sophistication and prevalence of malware. The Web today provides cyber-criminals with the targets, exploitable weaknesses, and anonymity required for largescale fraud
Modern 'Malware'Economy >Cyber-criminals have embraced Web 2.0 technologies,and specialise in various roles. >Tools of the trade are readily available for purchase, with some malware authors even offering technical support and updates to their products. >Basic strategy is to host new malicious sites/ compromise legitimate ones,and then lure victims to them. >Shift towards more stealthy and sophisticated malware e.g.Drive by Downloading,large surge in data theft Trojans malware
Modern ‘Malware’ Economy Cyber-criminals have embraced Web 2.0 technologies, and specialise in various roles. Tools of the trade are readily available for purchase, with some malware authors even offering technical support and updates to their products. Basic strategy is to host new malicious sites / compromise legitimate ones, and then lure victims to them. Shift towards more stealthy and sophisticated malware e.g. Drive by Downloading, large surge in data theft Trojans malware
PhD Focus >Anomaly detection techniques to better distinguish between normal and potentially malicious behaviour within a computer system. >Avenues of investigation Artificial Immune Systems 。Machine Learning Statistical Techniques
PhD Focus Anomaly detection techniques to better distinguish between normal and potentially malicious behaviour within a computer system. Avenues of investigation • Artificial Immune Systems • Machine Learning • Statistical Techniques
The Dendritic Cell Algorithm(DCA) >An abstract model of Dendritic Cell behaviour based on the paradigm of Danger Theory. >Aims to perform anomaly detection by correlating a series of informative signals with a sequence of abstract events(termed 'antigens'). >Signals>Multiple time series set to give approximations of normal or anomalous aggregate behaviour(termed either 'danger'or 'safe'). >Antigens>Symbolic IDs of the individual events. >The goal is to determine which event is most likely responsible for an observed rise in danger signals
The Dendritic Cell Algorithm (DCA) An abstract model of Dendritic Cell behaviour based on the paradigm of Danger Theory. Aims to perform anomaly detection by correlating a series of informative signals with a sequence of abstract events (termed `antigens'). Signals Multiple time series set to give approximations of normal or anomalous aggregate behaviour (termed either `danger' or `safe'). Antigens Symbolic IDs of the individual events. The goal is to determine which event is most likely responsible for an observed rise in danger signals .
Inputs to the dCA Multiple Time Series Data(Signals) Observable global Behaviour Temporal System Time Correlation AABCBDECSDESCEADGFEDETEABERSE RTBD ABERBJFJK GJUWBGTYOC FGKYBECW Individual events QHD /processes P ODN FRB T OFOTMNFI SODO FM OPF K FPF P UM RJT DKI MG P T F OI NBJ OM P gged Events per Time Interval (Antigens)
System Observable global Behaviour Individual events /processes Multiple Time Series Data (Signals) AABCBDECSDESCEADGFEDETEABERSE RTBD ABERBJFJK GJUWBGTYOC FGKYBECW QHD P ODN FRB T OFOTMNFI SODO FM OPF K FPF P U M RJT DKI MG P T F OI NBJ OM P O J Logged Events per Time Interval (Antigens) Time Temporal Correlation Inputs to the DCA
Some Limitations Signals Raw DCA Output Domain Score Data Antigen Parameters >Reliance on expert knowledge to carry out mapping into the antigen and signal space. >Can lead to the definition of inputs being quite arbitrary,difficult to compare applications. >Trial and error in finding appropriate parameters
Some Limitations Parameters Raw Domain Data DCA Output Score Antigen Signals Reliance on expert knowledge to carry out mapping into the antigen and signal space. Can lead to the definition of inputs being quite arbitrary, difficult to compare applications. Trial and error in finding appropriate parameters
My Approach Signals Model to Generate DCA Output Synthetic Score Data Parameters Antigen Parameters >Generate controllable synthetic data using a model. >Investigate the relationship between inputs,DCA parameters,and algorithm performance. >Focus on the deterministic DCA(dDCA)
My Approach Parameters DCA Output Score Antigen Signals Generate controllable synthetic data using a model. Investigate the relationship between inputs, DCA parameters, and algorithm performance. Focus on the deterministic DCA (dDCA). Model to Generate Synthetic Data Parameters
Signal Time Series 100 80 60 40 20 950 375 400 425 450 Timestep Errors in classification occurred at boundaries
Errors in classification occurred at boundaries
Phase 1:Formation of Phase 2:Input Processing Phase 3:Final Classification Inputs to DCA by DC Population Weights for Time Windows calculating CSM and K Threshold based Signal Signal calculation Segmentation by Mapping Processing ·Fuzzy set Theory ABS or TBS calculation Techniques used: ·No.of DCs Raw ·Simple stats ·Migration Population DC Metric PCA Threshold Domain n-gram analysis Distribution Analysis outputs Calculation Data Information Theory Expert Knowledge ·Reassign same migration threshold ·MCAV ·MAC ·Assign new Antigen Antigen threshold randomly ·K Alpha Mapping Sampling Randomly drawn from a pool of Antigen Multiplier antigen Allocated in Round Robin fashion
Phase 1: Formation of Inputs to DCA Phase 2: Input Processing by DC Population Phase 3: Final Classification • MCAV • MAC • K Alpha • Threshold based calculation • Fuzzy set Theory calculation • Randomly drawn from a pool of antigen • Allocated in Round Robin fashion • No. of DCs • Migration Threshold Distribution Techniques used: • Simple stats • PCA • n-gram analysis • Information Theory • Expert Knowledge Time Windows Antigen Multiplier Segmentation by ABS or TBS Weights for calculating CSM and K • Reassign same migration threshold • Assign new threshold randomly Signal Processing Signal Mapping Population Analysis Antigen Mapping Raw Domain Data Antigen Sampling DC outputs Metric Calculation