At the most recent Navaja Negra conference held in Albacete on 2, 3, and 4 October, I had an opportunity to present a tool that INCIBE has been working on for months. For all those who could not attend the event, but want to know more, the Merovingio project is presented here. Merovingio is an applications analyser that determines whether these are legitimate or malicious.
Why was Merovingio developed?
After a market analysis and trialling of various suggested solutions, we reached the conclusion that most of these were practically identical. They would be closely tied to the company that developed the project, which in most instances would make any future modification impossible. That is, they were closed environments in which it was highly unlikely that any extensions or changes in functions could be achieved.
The way that solutions work is mostly very similar: they depend on a virtual machine (from one specific manufacturer), and on an assessment and analysis using criteria which are very much the same. Their default configuration involves analysis times per sample of two to three minutes at least. The analysis begins when a possible virus is provided to the environment. Then, even if execution in the system ends within a few seconds, say because there is a missing DLL, or because the application is quickly identified as malicious, or because the environment is virtualized or for any other reason, the program continues monitoring execution until these two to three minutes have elapsed. Once this comes to an end, there is a need to restart so as to get back to the initial state, with all the extra time that requires. Logically, this makes the analysis process somewhat very slow.
The requirements that had to be met by Merovingio were:
- Analysing samples in the shortest possible time.
- Avoiding dependence on anti-virus companies.
- Speeding up the process by modifying time-management applications programming interfaces (APIs).
- Avoiding ties to specific virtualization environments.
- Being able to edit the source code.
- Controlling how the sample executes.
- Being able to analyse samples simultaneously on the same computer.
Keeping this requirements in mind, work was started. The first step was to go back to the “PebHook” project, details of which were published in 2008 by “Dreg and [Shearer]”, that is to say by David Reguera and Juan Carlos Montes, and which hooked the Program Environment Block or PEB. The second of these two people was a joint author during all the phases of development of the Merovingio tool. In brief, the original project allowed the generation of a library (DLL) for Windows systems that allowed real-time manipulation of the functioning of the system APIs. This made available control over the flow of an application, permitting its behaviour to be adjusted at all times.
For further information, the original paper can be downloaded from: http://phrack.org/issues/65/10.html.
Use of the library created by the original “PebHook” project allowed access to program flows. Thus, control of APIs lets us tweak their original functions. If the malware copied itself into a new location, it was possible to record all the information being copied byte by byte. In this way a controlled replica could be obtained, or the action might even be stopped, but still obtaining the information.
Another example of the use of “PebHook” would be timing control. Some applications may stay inactive for up to an hour before starting a malicious action, thanks to adjustments to their wait times. Through modification of APIs, we could speed up or even eliminate this pause, and so in a few seconds it was possible to see what they would do, without having to wait.
The “PebHook” project was brought up to date to meet our needs and current machines. We added recording capacities so as to keep a log file of the information about the application into which the library was inserted. Hence, the device to analyse malware progressed considerably, but a new problem arose: it was necessary to inject the library into all the processes we wanted to analyse and all the fresh ones that these might created dynamically. We had further requirements:
- It should be simple to use.
- It should return results.
- It should allow multiple analyses.
- It should check if a sample had already been analysed.
Merovingio, “the Agent”
The Agent is programmed in Python 2.7 for Windows XP. To manage the library and get it injected into every process, we found a solution in the use of Sandboxie. In this way, besides injecting the library into all processes being executed, we also got the advantage that the infection could not damage the host machine. This was an automatic process, and so easy to perform.
The function of the agent is to receive the samples, process them jointly with sandboxie and monitor every Sandboxie instance (the number permitted is unlimited). We are currently working with forty simultaneous instances. When an analysis is completed, the agent immediately and without unnecessary delay proceeds to collect the log generated with the library, sending it to a web platform, then completely erases any data on the sandboxie instance, and continues processing samples. All of this takes approximately two minutes per instance.
- The Merovingio Agent -
The file generated by the library is a log with a step-by-step record of the flow of the application, but this log is in raw format and is not comprehensible at first sight. So, to be able to process the results obtained extracting the useful information, we created Dorian-AI. This is a file analysis system with a layer of Artificial Intelligence (AI) based on a mixed arrangement of neural networks and rules. It is able to process logs and achieve early detection of new and so far unknown malware applications.
Dorian is given a text file with all the contents recorded for a given sample. To understand why this stage is necessary, it should be realized that an application with just nine or ten lines of code and barely reaching 18 KB will generate a log of about 8MB of text recording its entire behaviour from start to finish, noting every call it makes to APIs and the outcomes of them all.
Once this has been processed, it extracts the most relevant information about the process. This will be just a few lines describing the behaviour, and application of the AI engine will yield a result of “malware”, “suspicious”, or “clean”.
- Part of an Unprocessed Log File -
- A Processed Log on the Web Platform -
Merovingio is responsible for sending new samples received to the various parts of the process and putting into operation each and every one required. Finally, it takes information from Dorian for display on the screen, and then marks the process as ended. /p>
Merovingio is a web interface which is user-friendly and easy to use when uploading files for analysis. Similarly, it has been given new capacities to generate an API for an application, permitting integration into new projects. From the API we can control the whole environment as if we were using the web. In this way, new projects can send one or more samples and get results automatically.
When a sample is uploaded, the system or platform checks whether it has been analysed before. If it has, it simply returns the report with the data on it. If not, it is passed on to the Agent for analysis to begin. Once this is finished, the Agent transmits the log and the platform sends it to Dorian. After a few minutes, Dorian returns the data, now processed, and report on the results.
- Result History, Classifying Items as “Malware”, “Clean” or “Suspicious” -
- Route through the Platform -
What are the figures achieved?
Currently, using a single computer with simple capabilities, that is, not a dedicated server, but just an ordinary machine with 4Ghz processor and 4 GB of RAM, we can analyse at the very least:
- 720 samples per sandboxie instance.
- 14,400 samples using just twenty simultaneous instances.
To sum up, in Merovingio we have created a dynamic sample analysis tool with a high daily throughput. We achieved the objectives set, since it is automated, easy to use, and integrated. It is not tied to specific platforms or virtualization environments, since it has multiple compatibility.
The presentation of Merovingio made in the 4th edition of the Navaja Negra conference is available here.