Home / Blog / Basic steganography and steganalysis

Basic steganography and steganalysis

Posted on 04/21/2015, by Jesús Díaz (INCIBE)
Basic steganography and steganalysis

Although steganography took longer than cryptography to come into the public eye (despite also having a long history behind it), it is now quite popular in the area of computer security. Just as happens with cryptography, steganography likewise has several variants. Indeed, the word “steganography” is often used to cover what in reality is a more general field, the art of hiding information. This covers other related methods, such as watermarking, anonymity or covert channels.

- Various branches of information hiding, according to “Information Hiding – A Survey” -

Nonetheless, every classification is subject to different interpretations. To establish a shared starting-point, in what follows steganography is assumed to be a method which has the characteristics given below:

  • Transmits information imperceptibly.
  • Transmits reasonably large amounts information.
  • Is not necessarily robust against modifications (intentional or otherwise).
  • Uses a medium intended to transmit some other type of information.

For example, strictly speaking, watermarks would not be steganography, because:

  • They do not have to be imperceptible.
  • They usually transmit small amounts of information.
  • They must be robust against any type of modification.

Specifically, the examples that are given below use images as a means of transmission. Other transmission methods may be via audio or text.

Least Significant Bit steganography

In multimedia files, whether they contain images or audio (although not for text files), the least significant bits (LSBs) usually hold information which it is hard for the human eye or ear to perceive. This makes them rather attractive for hiding information, which gives rise to what is known as LSB steganography.

An intuitive way of showing this principle is to “paint” the various bit levels of an image (where, for instance, Level 7 is the most significant bit of each byte, and Level 0 is the least significant). This can be seen in the illustration below, which shows the different bit levels of the red (R) component of the celebrated picture of Lena, starting at Level 0 and running from left to right and from top to bottom. From Level 2 onwards, certain patterns can be seen at a glance, but at Levels 0 and 1 everything seems merely random, which indicates that the human system of vision is not really sensitive at these levels.

- Bit Levels for the R component of Lena -

Hence, one quite well known technique in steganography involves replacing the least significant bits of an image or an audio clip with the bits that it is desired to transmit clandestinely. After all, it would seem that these changes would go unnoticed by the human eye or ear.

However, this technique is rather basic, and if applied in an uncontrolled way is statistically easy to detect. This is because indiscriminate modification of the least significant bit alters their statistical properties. The fact is that people think that least significant bits are random (which is how it seems from the illustrations above), but this is not the case.

Pairs-of-Values steganalysis

The main technique for detecting LSB steganography is known as analysis by pairs of values (or PoV analysis). It was first put forward in the year 2000 in a paper by Westfeld and Pfitzmann entitled “Attacks on Steganographic Systems”. Its basis is a very simple, but striking observation.

Specifically, a byte can represent 256 values (from 0 to 255). If these 256 values are grouped in contiguous pairs, there are 128 possibilities: [0, 1], [2, 3], [4, 5], …, [252, 253], [254, 255]. The key point is the fact that, despite modifying the final bit of any byte, the resultant value continues to fall within the same pair, regardless of the new value of this final bit.

For example, assume that in an original image (with no hidden information) there are 100 pixels with a value of 4 and 50 with a value of 5, making a total of 150 pixels in the pair [4, 5]. After the use of LSB steganography to hide encrypted information (which will thus have approximately the same number of bits at 0 as at 1), there would be around 75 pixels with the value 4 and around 75 pixels with the value 5, yielding the same total of 150.

As the total distribution of the pair of values is constant, it is possible to obtain the expected frequency of any given value from the image to be analysed. This allows a Chi-squared hypothesis test to be performed over the distribution of observed frequencies in the image under analysis. Such a test makes it feasible to rule out the possibility of the image analysed containing information hidden by means of LSB steganography.

To get a feel for this method of steganographic analysis, it is possible to use the following Python scripts from our Github profile:

  • imgrand.py <input> <output>:
    • Randomizes the <percentage>% of LSBs in the image given as <input> and stores the result in <output>.
  • imgchi2.py <input>:
    • Applies PoV steganalytic analysis to the image in <input>. As a result, it provides the p-value associated with the test for each channel (the script uses an RGB format). Intuitively, this equates to the probability that the null hypothesis (in this case “There is hidden information”) is true.

For example, execution of ./imgrand.py img/lena.bmp 100 img/lena_rand_100.bmp gives the image img/lena_rand_100.bmp, generated by randomizing 100% of the LSBs in the image img/lena.bmp. This inclusion of random bits simulates the inclusion of encrypted bits. As can be seen from the illustration below, the difference is not observable to the naked eye:

- lena.bmp (left) and lena_rand_100.bmp (right) -

Nonetheless, if the script ./imgchi2.py is used, the following result is obtained:

- Result of Applying PoV Steganographic Analysis to lena_rand_100.bmp -

This outcome points to a high probability that information has been hidden in all three components of the image (Red, Green and Blue).

Limitations of the steganalysis

This method is effective in detecting LSB steganography when the amount of hidden information is large. Nevertheless, as the percentage of the capacity of the subliminal channel used drops (in this instance, using all the least significant bits means use of 100% of channel capacity), the effectiveness of the test also declines rapidly. For example, the following image shows the results for versions of lena.bmp in which 50% and 25% of the capacity of the subliminal channel were used:

- Results of Applying PoV Steganographic Analysis to Versions of lena.bmp with 50% and 25% of the Subliminal Capacity in Use -

This means that, with 50% of the subliminal capacity being used, PoV steganographic analysis suggests that there is about a 30% probability that there is hidden information. When 25% is used, it returns a probability of 0.2%. This is despite the fact that 50% and even 25% implies a good deal of information. For instance, the image used as an example is 512 × 512, with three channels (R, G, B), so that 50% of the LSBs would equate to 48 KB.

An immediate improvement can be achieved by applying this analysis to “windows”, or in other words, to portions of the image instead of the full image. The graph below, taken from “Attacks on Steganographic Systems” indicates that the image analysed probably contains hidden information in one half of the picture, but not in the other.

- Example of the Results Obtained by Applying PoV by Windows, Taken from “Attacks on Steganographic Systems” -

On this point, it should be kept in mind that to ensure sufficient statistical evidence, the windows used should not be too small. Moreover (and this applies to any statistical analysis), it should always be remembered that what is obtained is an estimate. This means that a positive result is not an irrefutable proof of the presence of hidden information.

In any case, LSB steganography is probably the most basic technique of multimedia (audio and image) steganography. For example, more advanced techniques permit information to be hidden while maintaining the first-order statistics (like histograms, on which the PoV method of steganalysis is based).