Machine Learning enabled attacks

by Saman Abbad Oct. 11, 2017 submitted by samanabbad
Download PDF

Introduction

In today’s Internet age Cybersecurity is a huge issue which has resulted in creating a domain which includes a number of different challenges. In around 2004 the global cybersecurity market was worth $3.5 billion & by the end of 2017 it will be worth $120 billion. Data security was & will remain important for different organizations forever hence it’s a field which is under constant evolution, it’s like a game of cat & mouse where the hackers are always busy trying to find new ways to break into security systems & similarly the security systems are trying to upgrade their systems to be ready against all kinds of different attack strategies & tools etc.

Big Data

As we are all aware that with the advent of artificial intelligence many jobs are being slowly replaced by computers or robots, I am not referring to your average desktop antivirus but rather imagine a scenario where If you own a large organization with the employee count going into the thousands. Now all these computers will generate logs probably in the petabytes, about their daily activities which are stored in a database and are then analyzed for threats individually & in patterns, this results in the creation of something referred to as big data. In 2012 Gartner Inc. defined Bigdata as follows: "Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.”1 Gartner's definition of the 3Vs is still widely used, and in agreement with a consensual definition that states that "Big Data represents the Information assets characterized by such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its transformation into Value"2

Machine learning

Hence from the above stated concept of Bigdata it should be clear that the next step in the utilization process is analysis. Now depending upon the type of analysis that you are looking to conduct the computer has to be trained for that specific purpose. This training process is referred to as “machine learning”, evolved from the study of pattern recognition and computational learning theory in artificial intelligence3, machine learning explores the study and construction of algorithms that can learn from and make predictions on data 4 Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; most common applications include email filtering, detection of network intruders or malicious insiders working towards a data breach5, optical character recognition (OCR)6, learning to rank, and computer vision. Now in this scenario the sheer size of the organization makes its cybersecurity a huge task hence the reason most large organization have their own cybersecurity departments with their own workforce, traditionally the bigger the organization the more the issues to deal with but now finally computers are finally coming on board in handling security issues. Just to get an idea of the scale & the type of threats, let’s take a look at a few statistics and descriptions:

Malware

Software which is specifically designed to disrupt, damage, or gain authorized access to a computer system is categorized as Malware. In Q3 2016 alone, 18 million new malware samples were captured. That’s an average of 200,000 per day and that’s only the malware samples detected by one company. Malware continues to grow and evolve to bypass your antivirus and other levels of protection, which makes it hard for your IT team, your vendors, and your company to keep up.

Ransom ware

As the name suggests is a type of malware that prevents or limits users from accessing their system, either by locking the system's screen or by locking the users' files unless a ransom is paid. More than 4,000 ransom ware attacks have occurred every day since the beginning of 2016.That's a 300% increase over 2015, where 1,000 ransom ware attacks were seen per day.

Computer Virus Statistics
Computer Virus Statistics
Threat by Type
Threat by Type
Type of Leak
Type of Leak
Leak Channel
Leak Channel

Security information and event management

Now that we have an idea of the scale of the challenges being faced by the cyber security departments that large organizations are exposed to billions if not more malware attacks daily now the security has to be configured in such a way that not only raises an alarm upon encountering a threat but also identifies & classifies threats keeping its user fully aware of the situation. The system doesn’t stop there but in fact it then goes a step further by correlating different events of notice to form a trail like a detective to guide the user and present him or her with a clear picture of what is happening within the subject ecosystem. Such software is classified as SIEM (Security information and event management), SEM (Security Event Management) or SIM (Security Information Management), these terms are often used interchangeably to refer to this software. They are highly customizable and trainable now let’s see where machine learning comes into this & how its contributing to the fight against cybercrime. This level of software intelligence is being made possible by technologies such as deep learning which we will take a look in the following section

Threat Intelligence System
Threat Intelligence System

Deep Learning & Artificial Neural Networks (ANN)

A single Malware is very easy to create & very difficult to detect but once recognized the system can be taught to respond to it but a slight modification in the original malware and the system again won’t recognize it! This way hundreds or thousands of new malwares are recreated from a single original. In such a situation we require a different strategy in order to effectively create a safe zone, this is where artificial neural networks come in these are systems which learn (progressively improve performance) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express in a traditional computer algorithm using rule-based programming. In the field of cybersecurity things can be classified as malware or not depending upon the level of similarity with general malware types which the system already recognizes. Of course, this doesn’t happen overnight the network has to be trained or has to learn which is a time-consuming process
An ANN is inspired by the biological neural networks based on a collection of connected units called artificial neurons, (analogous to axons in a biological brain). Each connection (synapse) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by a real number, usually between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream. Further, they may have a threshold such that only if the aggregate signal is below (or above) that level is the downstream signal sent.

Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

A single-layer feed forward artificial neural network

Single Layer Feed
Single Layer Feed

A two-layer feed forward artificial neural network

Two layer Feed
Two layer Feed

Therefore, for SIEM the software has to be trained with all information available about malware for instance by giving it recognition to all different types of malwares currently inexistence, also it has to be given the intelligence to categorize anything that matches a malware up to a certain degree as a malware as well, making learning process advanced enough that it can continue semi supervised or even unsupervised. An artificial neural network Insert caption here

User behavior Analytics

Now we have seen above how the SIEM will detect and protect from outside but there is another aspect of security which is to be considered here and that is the insider threat. Now consider the example of any random employee who while using a VPN for certain task was a victim of phishing attack and had his username & password information stolen now the system is suffering from a data exfiltration attack, now the SIM will be able to identify this only if it sees something which is abnormal either individually or if a certain series of activities when chained together point to something unusual or dangerous common examples can be exfiltration or abnormal lateral movement of data. The SIM has to have a system understanding advanced enough that it can analyze user as well as environment behavior, the ability to co-relate different activities provides the system with ability to produce something referred to as the “kill chain”. The kill chain is the identification of the whole chain of events which have led to the current incident i.e. the Who What Where When Why How details of the whole incident which eventually gives a clear picture of the whole situation, so that the user knows of what has happened, what is confirmed & this is what needs to done because without the ability to respond all this information becomes worthless. The system must provide information which the user can assign understanding & intelligence to such as timelines or details & identification of the impacted Eventually giving us a system which takes millions of events classifies them accordingly & then only highlights thousands as “anomalies” and raises only 5 as actual “threats” and provides the actual kill chains of all the raised incidents ANN dependency graph

Conclusion

These advances in AI technology can be used to against the current cybersecurity infrastructure in place. In the near future, as artificial intelligence (AI) systems become more capable, we will begin to see more automated and increasingly sophisticated social engineering attacks. The rise of AI-enabled cyber attacks is expected to cause an explosion of network penetrations, personal data thefts, and an epidemic-level spread of intelligent computer viruses. Ironically, our best hope to defend against AI-enabled hacking is by using AI. But this is very likely to lead to an AI arms race, the consequences of which may be very troubling in the long term, especially as big government actors join the cyber wars.

Published with the express permission of the author.

Avatar
2flash 1 week, 1 day ago

These attacks are getting more and more often lately... such articles are great because they do raise awareness on the matter!

Reply