An Initial Investigation into Attribution in SCADA Systems

Supervisory control and Data Acquisition (SCADA) systems play a core role in a nation’s critical infrastructure, overseeing the monitoring and control of systems in electricity, gas supply, logistics services, banks and hospitals. SCADA systems were once separated from other networks and used proprietary communications protocols, hardware and software. Nowadays modern SCADA systems are increasingly directly or indirectly connected to the Internet, use standardised protocols and commercial-off-the-shelf hardware and software. Attacks on these systems have the potential for devastating consequences and attribution of attacks against SCADA systems presents new challenges. This paper investigates the use of techniques to attribute cyber attacks against SCADA systems. We investigate the use of ﬁve known technical attribution techniques in SCADA systems.


INTRODUCTION
SCADA systems are responsible for the monitoring and control of a wide range of a nation's critical infrastructure, including electricity, gas supply, logistics services, banks and hospitals.As such, successful attacks against these systems could have a devastating effect.In the past these systems were separated from other networks, proprietary communication protocols, hardware and software were used, and attacks were of a physical nature.Nowadays there is significant evidence showing that these systems are directly or indirectly connected to the Internet (Radvanovsky and Brodsky 2013).This evidence also shows that these systems are protected with weak or no authentication.ICS-CERT, the U.S. computer emergency response team for industrial control systems, reported that attacks against critical infrastructure are increasing each year (ICS-CERT 2011).2011 saw a reported 198 incidents, as shown in Figure 1.
Commercial-off-the-shelf SCADA hardware and software are used to cut costs and reduce time to market.This lowers the skill level that is required of an adversary to target such systems and means that anybody with the money and motivation can purchase the hardware and software that is used throughout the critical infrastructure.Attacks against these systems are becoming so common that they are now included in point-andclick penetration testing tools, such as Metasploit, significantly reducing the level of expertise required.
Protecting the systems that monitor and control critical infrastructure has become a primary concern for lawmakers and politicians.The EU Cyber Security Strategy states that critical infrastructure incidents must be reported to a competent authority, while in the U.S., President Obama has signed an order which promotes information sharing on incidents (Euractiv 2013).
Identifying the perpetrator of a cyber attack, known as attribution, is a difficult and ongoing problem in the traditional IT domain.Adversaries are able to route attacks through a maze of routers, switches, anonymising services, proxies and compromised hosts, such as with botnets.Adversaries can create malware that morphs or deletes evidence of its presence on a target system.Primary motivators for attribution are (Hunker et al. 2008): • The prospect of an attacker being identified can serve as a deterrent to future attacks.
• Knowing the identity of an attacker, and information gained in the process of attribution, can be used to improve defensive techniques.
• Attribution, even partial attribution, can provide the basis for interrupting attacks in progress.
Attribution is of significant importance when concerning cyber attacks against SCADA systems and critical infrastructure.Threat actors that are likely to target SCADA systems include nation states, activists, organised criminals, insiders and terrorist groups (Nicholson et al. 2012).Attacks against these systems have the potential to cause physical damage to expensive equipment, wide scale panic and loss of human life.Identifying the perpetrators of attacks or attempted attacks is critical.
This paper investigates the use of techniques to attribute cyber attacks against SCADA systems.Five known attribution techniques are investigated: traceback, deception, digital forensics, network forensics and malware analysis.We limit discussion to a subset of fieldbus protocols and SCADA equipment, selected because of their pervasive use across the critical infrastructure sectors.Despite these restrictions these findings should be applicable at a wider scale.where it can be interpreted by an operator using an HMI.The RTUs and PLCs can control parts of the infrastructure directly, such as regulating valves or activating switches through the IEDs, based on either data from the field sensors, or from operator input from the control centre.The resulting control action then flows from the operator or RTU to the IED to make the change to the system.These systems are physically connected by Ethernet, fibreoptic cabling, telephone lines, microwave, satellite or radio.SCADA systems today use standard network topologies such as bus, hub, ring and star and are therefore prone to attacks that rely on these specific topologies to be present.
Figure 2 shows a typical SCADA environment (Pacific Northwest National Laboratory 2006) and presents the segments, machines and devices that have been discussed.

Protocols
In the corporate network segment the protocols used are essentially the same as in any other corporate network segment e.g.SMTP and IMAP for e-mail, SSH and FTP for file transfer, HTTP for web services, DNS, TCP, UDP and ARP for network operations, primarily within IPv4, although in some cases, IPv6.
In the SCADA network and field device segments the protocols are mostly only found within industrial networks e.g.Modbus, DNP3 and ICCP.However, there is some overlap, in some cases PLCs may be accessed by HTTP/HTTPS, SSH, FTP etc.In the past SCADA systems were composed of proprietary protocols.This added a layer of security by obscurity.However, it meant buyers were locked in to one particular vendor.If that vendor were to go bankrupt then support for the technology would be gone.Nowadays open and standardised protocols and architectures are prevalent, meaning that buyers can mix and match equipment and do not suffer from vendor lock-in.It also means that protocols can be scrutinised by the SCADA community and are able to mature.However, researchers highlight that using common, general IT network protocols and software (e.g.Windows NT) increases the chance of an attack (Igure et al. 2006).This stands to reason since a much wider pool of adversaries understand the weaknesses inherent in corporate network protocols and systems, as opposed to those only used in SCADA.With a general understanding of the SCADA history, architecture and protocols, we move on to discuss attribution.

Overview
Attribution is used in traditional IT domains to identify the who, what, where, why and how of cyber attacks.It is defined as 'determining the identity or location of an attacker or an attacker's intermediary' (Wheeler and Larsen 2003).Attribution might result in a conclusive answer such as who did it: an individual, a group, a nation state, etc. Attribution may only provide an inconclusive hint and other intelligence/evidence may be required.Successful attribution may identify perpetrators so that they may face prosecution i.e. cyber crime, or be used to plan retaliation, i.e. cyber warfare.Attribution results may be brought to public light to embarrass, shame or demonstrate superiority over an adversary.
Attribution takes place after an intrusion or compromise has been identified.In traditional IT environments intrusions are detected using a variety of technical tools such as signature or anomaly-based intrusion detection systems, anti-virus, baselining tools and security information and event management tools.Researchers have modified these tools to detect intrusions in the SCADA environment, however, they must be engineered to parse the unique and sometimes proprietary protocols that are prevalent in the SCADA and fieldbus segments (Verba and Milvich 2008) (Fovino et al. 2010).
A number of traditional security paradigms do not easily transfer to SCADA environments.For example, penetration testing, which involves finding and exploiting vulnerabilities to evaluate security, is an often noisy and disruptive task.SCADA systems are fragile; availability and uptime is of critical importance.SCADA-specific traditional computer security defences, such as firewalls and intrusion detection systems are available.However there is currently little research that considers the feasibility of deploying computer security attribution techniques in these environments.

Selection of Attribution Techniques for Investigation
It is said that 'there are many types of attribution, and different types of attribution are useful in different contexts' (Clark and Landau 2011).We consider a subset of available technical attribution techniques and a taxonomy by Wheeler and Larsen (2003), shown in Figure 3, presents a useful starting point for this selection.From the possible techniques, five were selected, based on their pervasiveness, diversity and academic interest.They are: traceback, deception, digital forensics, network forensics and malware analysis.In the following subsections the discussion involves the following themes: • Overview of technology in its traditional domain • Relevance in the SCADA environment • Suitability in the SCADA environment based on above points

Traceback
Traceback is a network-based approach to attribution that has received significant interest from the academic community.In almost all proposals this technique involves attributing a source IP address (or closest router) and intermediate router IP addresses to mark the tracks of an attack over the Internet.The two main traceback proposals are packet logging and packet marking.Packet marking involves marking packets with identifiable information when passing through networking devices, such as routers or switches (Belenky and Ansari 2003).Packets are marked with information that describes the path that the packet has taken.A victim may collect and inspect received packets in order to deduce the attack path.In packet logging routers store information about the packets that have passed through them (Gong and Sarac 2005).During an attack or post-attack, network traffic data stored at each router may be questioned to see if the router was part of the attack path.Traceback in traditional IT domains was primarily proposed to stop ongoing Denial of Service attacks.This type of attack is generally seen as an inconvenience in the traditional domain, however against SCADA systems, could have devastating effects.Therefore it is vital to investigate how this technique could be applied to SCADA networks and protocols.
Packet marking proposals manipulate the content of protocol headers against the spirit of the standards and request for comments (RFCs).Infrequently used packet header fields, such as the IPv4 Identification field, are loaded with traceback data, which in this example removes packet fragmentation functionality.Logistically this is possible with an IPv4 header, but some of the popular fieldbus protocols have limited available space in the packet header.IPv4 and DNP3 packet headers are shown in Figure 4.
Specifications have been defined for a number of SCADA protocols, such as DNP3 and Modbus, to be encapsulated TCP/IP, meaning that packet header space may not be such an issue, providing that this flavour of the protocol is used.Another type of traceback, ICMP traceback, creates new network messages which contain the route information (Bellovin et al. 2003).In IP networks this results in an increase in the amount of traffic flowing through the network.SCADA networks are generally far quieter than IP networks and an increase in traffic may have adverse effects; in the worst case it could prevent critical messages from reaching their destination.
Attribution by traceback suffers in the traditional IT domain as it can only provide a source address or addresses involved in the attack path.The crux of this issue is known as the stepping stone problem.
When reaching the endpoint of the connection, the machine may simply be a compromised machine that an adversary has under his control.Indeed there could be 20, 30 or 100 stepping stones that add degrees of separation between the adversary and the victim.To make matters more complex, the attack path could be routed through multiple countries and legal jurisdictions.This problem is still inherently true when deployed in SCADA systems.Additionally, if a traceback goes beyond the domain control of the organisation, then there must continue to be traceback capability in other domains, in order to attribute any further.When using the Internet as an example, every ISP would need to support traceback.Finally, traceback is a network-based attribution technique.Should malware propagate through removable media, as Stuxnet did, then it would not be identified by this technique.

Deception
Honeypots are specially crafted systems that lure adversaries by imitating vulnerable systems, services and software.They are defined by Lance Spitzner, creator of The Honeynet Project, as 'an information resource whose value lies in unauthorized or illicit use of the resource' (Spitzner 2003).Honeypots monitor interaction between the adversary so that the collected data can be analysed by an investigator or automated process.Honeypots are capable of misleading adversaries into revealing information about themselves, by e.g.inadvertently revealing their preferred tools and techniques, coding mistakes and hours of operation.It is in this information that adversary attribution clues may be identified.
Honeypots are often classified by their fidelity; low to high.Low interaction honeypots, such as Dionaea (Dionaea 2013), generally simulate a single service and are effective when facing automated scripts such as worms.They are easy to deploy and manage and offer less risk to the owner as there is less chance of low interaction honeypots being compromised.However they are quickly identified as honeypots by human adversaries, since they offer only limited interactivity.High interaction honeypots are fully fledged operating systems hosted on physical equipment or in a virtual environment.They require high levels of human monitoring and there is an increased risk that they may be compromised and controlled by an adversary.However, they offer much higher levels of fidelity, such that there is less chance of them being identified as a honeypot.
SCADA honeypots have already been created and deployed.Pothamsetty and Franz (2004)  Research on SCADA honeypots so far has shown initial promise, however the majority of proposals have taken known honeypots and tried to fit them into a SCADA environment.Security experts who wish to develop SCADA honeypots should work closely with SCADA engineers in order to create realistic honeypots that can be used effectively.

Digital Forensics
Digital Forensics is a broad subject which involves the recovery, acquisition and investigation of digital evidence.In traditional IT domains commercial tools such as EnCase and FTK and open source tools such as Sleuthkit and Autopsy are used to acquire, analyse, and report on digital evidence.These tools tend to be specific to x86 and x64 processor architectures and targeted towards file systems, such as FAT, NTFS and popular operating systems, such as Windows and Linux.
Forensics in a SCADA environment could identify attribution data to identify perpetrators.In the SCADA network segment and field device segments there are a broad range of devices which may store a wealth of digital evidence.However, SCADA systems come with a unique set of challenges for forensic analysis.For example, the standard forensic procedure for taking a bit-for-bit disk acquisition involves switching off a system, connecting the hard disk to a write blocker and acquisition system and then waiting for the acquisition to complete.Switching off a SCADA system which monitors and controls critical infrastructure is unlikely to be an option.One way to mitigate this issue is to have fail over systems.However, this is costly and if the fail over system is a duplicate of the original system, it might be infected in exactly the same way.
The diversity of devices that a forensic investigator can encounter in the SCADA environment is far wider than that of the traditional IT domain.
Traditional IT systems have a lifespan of a couple of years, perhaps 10 at most, while some SCADA systems from the 1960's are still operating.However, as PLCs and other SCADA devices continue to move towards commercialoff-the-shelf hardware and software, the forensic analysis of SCADA systems becomes standardised and therefore simpler.
Among the diverse devices found in SCADA environments is the Historian, which is essentially a database management system (DBMS).It collects a wealth of data to enable auditing, trend analysis and anomaly detection.As a DBMS, traditional database forensics techniques should be suitable for these devices.However, unlike the historian, many of these devices encountered are unlikely to have persistent memory.It is true that 'most process control systems were not built to track their processes, but merely to control them' (Nance et al. 2009).For example, the Siemens S7-300, shown in Figure 5, is a popular PLC.For storage, it uses a micro memory card (MMC) which ranges from 64KB to 8MB, while integrated CPU memory for this device ranges from 32KB to 2MB.In live forensics data acquisition takes place while the system is operational.In traditional IT systems, tools are used to capture running processes, RAM memory, browsing history and more, in the order of volatility.Performing live forensics on an operational machine in a SCADA environment prompts significant challenges; accidentally causing the machine to crash could be catastrophic.Ahmed et al. (2012) discusses this issue and suggests using fail over systems to allow for live forensic analysis to take place.Another challenge is that post-incident the investigator is competing with recovery efforts which will most likely destroy evidence.There is also clearly a logistics concern when performing SCADA forensics.Field devices could be located many miles away, perhaps on different continents, or perhaps in difficult to reach places, such as on the ocean floor.Physically reaching these devices may not be possible.
Forensics is primarily a practitioner-led field with research taking place as and when it is required.
In a recent effort to outline a research agenda for this field, SCADA forensics was identified as a predominant theme (Nance et al. 2009).The following points were identified as near future research for forensics in SCADA systems: • Collection of evidence in the absence of persistent memory • Hardware-based capture devices for control systems network audit trails • Honeypots for control systems as part of the investigatory process • Radio frequency forensics • Intrusion detection systems for control systems

Network Forensics
Another field of forensics used in traditional IT systems is network forensics.This field primarily involves two stages: collecting network messages and analysing network messages.Existing infrastructure such as switches and routers can be configured to collect messages, or extra equipment can be deployed, such as a network tap device.By logging messages to files, analysis can take place during an attack or post-attack.During analysis of network traffic, attribution data can be found, such as connection source, time of connection, commands that were sent and payload data.
Collection of data is relatively straightforward.An organisation must identify points in the network were they wish to collect network data.Of course, similar to traceback, network forensics will only be able to identify attacks that use network communications as a vehicle for attack.Those that use removable media will not be visible.

Malware Analysis
Malware, in it's various forms; virus, worm, trojan, adware, spyware, back doors and rootkits, may be analysed to identify characteristics which could be used as an attribution data source.Malware analysis in the traditional IT domain can be split into two areas: behavioural analysis and code analysis.

CONCLUSION
In this paper we have investigated the use of a subset of attribution techniques that are used in the traditional IT domain, when used in the SCADA domain.There are many more attribution techniques that could form further investigation e.g.Wheeler and Larsen (2003).Technical controls for performing attribution are just one aspect of a wider solution.
Non-technical measures, such as cuo bono analysis and following the money are used to attribute attacks.Asking questions such as 'Who has the resources?' and 'Who has the motive?' serve to logically identify perpetrators.
Attribution should not be seen as a single process or problem, but rather a jigsaw puzzle in which there are many attribution pieces.Different techniques, both technical and non-technical, are able to yield different pieces of the puzzle.While it is unlikely that all of the pieces will be collected following an intrusion, more pieces help to provide a better overall picture.A theory of deduction, then, is helpful to logically understand this attribution puzzle and further research into attribution techniques that do and do not work well together would be a useful exercise.
As the number of SCADA incidents continues to increase as shown in ICS-CERT (2011), the ability for nations to attribute attacks against their critical infrastructure becomes evermore important.By reviewing the selected subset of attribution techniques it can be seen that there are a few techniques that have been modified for use on SCADA systems but that tools and techniques specifically designed for them are lacking.It is equally clear that there is an urgent need for an effective set of SCADA attribution tools.This paper has highlighted the challenges associated with transferring techniques designed for traditional IT systems and the Internet to SCADA systems.An outcome of this work is a clear requirement for more techniques to be developed for, or modified for use on, SCADA systems.Similarly there is a juxtaposition of these two domains, where traditional IT domains and SCADA domains connect.Attribution solutions need to account for malware like Stuxnet, which is able to transcend both of these domains, exploiting vulnerabilities in the corporate network segment and the SCADA and fieldbus segments.Effective solutions will ensure that security experts and SCADA engineers collaborate closely.For example, a honeypot design that has significant input from both parties is likely to appear far more realistic to an adversary.A workshop to identify a forensics research agenda took place (Nance et al. 2009).A similar workshop for attribution and SCADA systems would be a productive exercise, providing that the right balance of individuals were present, for example, technical security specialists, SCADA experts and legal experts.

Figure 5 :
Figure 5: Siemens S7-300 PLC These devices are controlled by RTUs and PLCs, primarily with fieldbus protocols.RTUs monitor IEDs and transmit data to PLCs or the SCADA network using protocols such as Modbus and DNP3.PLCs are computers which are able to automate functions primarily using simple ladder logic statements.Sensor data flows from the sensors in the field to the RTUs and PLCs and/or to a data collection point in the SCADA network, available to buy off the shelf in an open market.While this saves on purchasing and operating costs, it also means that the systems are free to be inspected and probed by any motivated and financed entity.SCADA systems have been directly or indirectly connected to the Internet, driven by the requirement for engineers to perform remote maintenance, rather than travel to remote sites.Researchers have identified that these systems are protected with weak or no authentication (Radvanovsky and Brodsky 2013).Security solutions for insecure legacy SCADA systems have been non-intrusively retrofitted, to prevent disruption to operations.