MI5 Coding Challenge

MI5 (Military Intelligence, Section 5) is the United Kingdom’s domestic counter-intelligence and security agency and is part of its intelligence machinery alongside the Secret Intelligence Service (MI6), Government Communications Headquarters (GCHQ) and Defence Intelligence (DI). The service is directed to protect British parliamentary democracy and economic interests, and counter terrorism and espionage within the UK.

You would think that with this description their recruitment process is very strict, but recently I found this coding challenge and was disappointed with the redundancy of the solution. I hope my approach was incorrect and that the real solution to the stenographic challenge is more complicated than I found. Nonetheless, it is a good “Hello World” exercise if you are into the analysis of data, cryptography, and/or lack-of-data-driven investigation.


A chunk of random alphanumeric characters, some forward slash, and plus signs. By the character set we can assume this is a BaseN-encoded data. RFC 4648 defines the specification for the Base16, Base32, and Base64 data encodings. For a string foobar the resulting of these three encoders is this:

BASE64("foobar") = "Zm9vYmFy"
BASE32("foobar") = "MZXW6YTBOI======"
BASE16("foobar") = "666F6F626172"

Decoding these strings we find that it is encoded in Base64 as the other two return either an invalid data or invalid UTF-8 string. Notice that the Base32 command is not available in an standard Unix installation, but the package exists and for the Base16 decryption I wrote a script in Perl.

$ echo "iVBOR...uQmCC" | base16 -d 1> /dev/null ; echo $? # --> 1
$ echo "iVBOR...uQmCC" | base32 -d 1> /dev/null ; echo $? # --> 1
$ echo "iVBOR...uQmCC" | base64 -d 1> /dev/null ; echo $? # --> 0

After saving the output of the base64 command we want to use the Unix command file to detect its mime-type which will give us a hint at what data is contained in it. We discover that the data is a PNG image with 92x163 pixels.

$ echo "iVBOR...uQmCC" | base64 -d 1> output.ext
$ file output.ext
file.ext: PNG image data, 92 x 163, 8-bit/color RGBA, non-interlaced

At this point we know this is an steganographic challenge, which generally means there is hidden data in the least significant bits of the image. Usually, the Unix command strings can give us the solution right away if the hidden data is embedded in the comment section, other times the data is another image embedded at the end of the original one, and other times the bits composing the header were modified to make it look like an image.

$ strings output.png
As I read, numbers I see. 'Twould be a shame not to count this art among the great texts of our time7

Is this the solution? What does “As I read, numbers I see” means? Is this gibberish to distract us? If we open the file with an image viewer we can see a zebra-like pattern of pink and blue stripes, it looks like image noise.

Remember that an image is composed by pixels, maybe if we read pixel by pixel we can find something. But working with colored images is usually not a good idea. PBM is a simpler format we can use to find hidden data through pixels, we can use Gimp and export the PNG image to PBM in ASCII format, this will represent the black pixels with integer one and the white pixels with integer zero. But this is a coding challenge so lets use code to get the binary output.

First, lets lets convert to black and white using ImageMagick:

$ convert output.png -threshold 50% threshold.png

Now lets use Python to read pixel by pixel, but instead of reading X and Y we will assume that the data is a long single-line image which means we will read from top to bottom on the X axis.

#!/usr/bin/env python
from PIL import Image

solution = ""
image = Image.open("threshold.png")
picture = image.load()

for y in range(image.size[1]):
    for x in range(image.size[0]):
        print(picture[x, y])

Lets go back to the first hint “As I read, numbers I see” there must be some significant meaning on this. We already have a pixel-by-pixel reader and we are seeing numbers, but there is not solution yet, lets think about it… We have an image with random black pixels in a white canvas — or the other way around if you prefer — are they really random? Could the position of the pixels mean something? Lets modify the script to print the position of the pixels are Unicode characters and see what happens:

#!/usr/bin/env python
from PIL import Image

solution = ""
image = Image.open("threshold.png")
picture = image.load()
number = 1
color = 0

for y in range(image.size[1]):
    for x in range(image.size[0]):
        pixel = picture[x, y]
        if pixel == color:
            number += 1
            solution += unichr(number)
            number = 1
        color = pixel


Voilà! We have an hexadecimal string, lets decode it:

$ python solution.py | tr -d '-' | xxd -p
Songratulations, you solved the puzzle! Why don?t
you apply to join our team? mi5.gov.uk/careers
