For a long time information was analysed through rudimentary technical measures such as data collection by hand and on paper, the analysis then being used to achieve an objective: to optimise tasks or modify them according to the analysis results.
In the world of IT, the first companies to include the concept of Big Data were companies focussed on web services that looked to improve them based on the needs of the market (Google, Facebook, Yahoo!, etc.).
Big Data is understood as the storage, analysis and exploitation of obtained data, and decision making based on these previous phases (especially analysis techniques).
Generally, data comes from different sources and in different formats, which means that purification and standardisation acquires an important role when information is first processed. There also tends to be a large volume of data, which complicates the speed of analysis needed to reach a decision.
It is essential to keep these characteristics in mind in order to make the most of tasks and to obtain results quickly and with guarantees.
People generally speak of the 4 Vs of Big Data:
- Volume: There is a high quantity of data to analyse.
- Variety: Information sources have no connection among themselves and present data in an unstructured way.
- Velocity: The minimum time taken between recovering information, processing it and making a decision.
- Veracity: Data must also be treated honestly after it has been obtained so it does not alter decision making.
These characteristics are also applicable to industrial environments, in which velocity and veracity are highlighted as key.
In this type of environment there are commercial solutions that monitor processes within industrial plants, which, thanks to data collection, processing and exploitation, contain decision making functions based on information obtained in different locations and sub-processes.
For example, results obtained can control alarms in case of an increase in pressure in pipes, centralise HMI data, analyse financial losses based on the velocity of industrial plant processes and other applications that will depend on the optimisation of software as much as on field devices.
- Big Data Phases -
Big Data OT vs. IT
Industrial deployments have a large number of sensors and actuators in order to control processes and check their quality. All these data are collected for supervision by a SCADA system and stored in databases. The continuous analysis in real time of these data is of vital importance, and therefore the numbers of resources dedicated to this task are high.
In an IT environment the objectives are different. Real time processing is not necessary but it tends to be treated in hindsight in order to improve services and manage resources internally as well as externally.
- OT vs. IT managed data -
Benefits of Big Data in Industrial Control Systems
Other than improving efficiency and velocity in decision making, the following improvements are realized when Big Data is incorporated into industrial environments:
- The ability to anticipate asset failures or problems involved in the processes, making it possible to act before they take place; the proactive structuring of maintenance works and the improvement in the quality of the service.
- Improvement of the generation/demand balance of products made in the processes. Thanks to the purchase and sales data, organisations can improve their control of the manufacturing processes.
- Correction of anomalies. Allowing regular reports on infrastructure performance to be generated, quickly correcting anomalies.
- Improved energy efficiency, reducing supply costs and maximising their use since they are able to adopt new ways of using energy and can even facilitate the incorporation of renewable energies.
The alignment of these benefits with the concerns that, according to Gartner, are recognised by the industrial sectors, paint a complete picture in terms of understanding the industrial interest in Big Data.
- Business Strategy with Big Data by sector (source: Gartner) -
We're going to get on well...Cybersecurity, ICS and Big Data
The centralisation of data obtained from different sources in one unique system makes it the objective of malicious cybercriminals trying to steal information. It will not be necessary to undertake a network analysis or to make the security of many different sources vulnerable. It will be enough to find a breach in the system where the Big Data software is used.
There are real cases where attackers have gained control of the system thanks to the exploitation of a vulnerability in the software used by Big Data.
Due to this, the application of best practice is advised, both to network segmentation and in patches reducing and the number of exploitable vulnerabilities within the network, paying special attention to the software used by Big Data.
Incidents such as that which took place at the water treatment plant in Maroochy, Australia, in which an ex-employee of the company repeatedly illegally accessed control of the sewage works system to tip litres and litres of waste water into rivers and parks, could have been avoided by using Big Data. By collecting all the access logs and information from other sources and searching for a pattern, the repeated access from outside the business system could have been detected. Furthermore, data collection and its analysis can aid the discovery of new gaps in security and help correct them.
It must not be forgotten that including a new collection system and analysis of information can initially have negative consequences in terms of cybersecurity. Therefore, it must be carried out after a previous study and, if possible, be previously tested in a preproduction environment.
In some industrial environments, the collection of information is of vital importance, therefore incorporating tools that aid data processing, such as Big Data analysis, is always a good option provided the systems that host this software are found to be adequately protected to face intrusions searching for sensitive information on industrial processes.