Your task, should you accept it: Identify which one contains the decoded string “FINDMEPLZ” without decoding the stream.
A common task in cybersecurity is decoding a base64 blob and checking if it’s contents contain anything “spicy”. For one-off tasks, this is simple enough. However, when building automated systems to identify base64 encoded content it can quickly incur unecessary computational overhead when following the steps:
Find a base64 encoded block of data in a larger stream
Decode the content
Search the content for a matching string
The following document aims to describe a shortcut that allows for directly searching for a base64 encoded value without needing to parse or decode it’s contents.
Base64 Primer
Base64 encoding takes a stream of bytes (8-bits) and encodes them using 64 different values. Typically encodings use [A-Za-z0-9] for the first 62 characters with the remaining 2 characters varying on implementation.
If a single byte (8-bits) is base64 encoded, each base64 character represents 6-bits of the original input. Using Least Common Mulitple (LCM) we can determine the number of bits needed for these two sliding windows of bits will align to represent all of the data as a multiple of 8-bits.
Above we can see that the length of a base64 encoded value should a multiple of 4. This behavior can be observed by padding characters (commonly =) which are used to round out input streams whose length does not match a specific alignment.
This also demonstrates that for every 4 base64 characters, a maximum of 3-bytes of input can be wholly represented.
Patterns Emerge
Now that we’ve observed this behavior on a small scale, let’s scale this up a bit. We’ll slide a 6-byte block across a 24-byte input at first to avoid having to think about padding or alignment too much to start with.
Starting in the top-left and following the diagonal to the bottom-right. An initial conclusion may be that Wlpa is a viable value to check for the prescence of in order to determine if the decoded text would contain ZZZZZZ.
While that may hold true in the specifically crafted example above, let’s see what happens when we shift 8 Z’s across our samples.