Suricata evasion, starring URL decoding

How does Suricata’s URL decoding work? It’s more complex than you think!
suricata
detection
encoding
bypass
Author

Ron Bowes

Published

June 5, 2025

These days, one of my favourite hobbies is complaining about Suricata. In this blog, I’m going to talk about some of the weirdness in Suricata when processing URL-encoded data!

I’m gonna go into deep detail about one technical aspect of Suricata rule creation. If you enjoy this, let me know somehow and I’ll write more! I have a huge list of ways to bypass Suricata rules.

Background

Over a year ago, the GreyNoise research embarked on a project to convert all of our tags to Suricata from… well let’s just say from the way we used to do it. It was one of the biggest tech-debt loan payoffs I’ve ever been involved in, and has improved our capabilities more than I ever could have imagined.

But, it wasn’t all sunshine and rainbows (happy Pride Month from a GayNoiser!). Not to air too much dirty laundry, but the process of converting well over 1000 tags to Suricata had some growing pains. Buy a GreyNoise researcher their drink of their choice next time you see them, and they’ll tell you some horror stories from the period that we dubbed “Suricatageddon”.

Initially, we filled in a basic template in a plain textbox in some webapp that I forget, and tried to write valid rules with unique sid values. That kinda worked, in that we could write rules and they would go into the detection pipeline; whether they were valid Suricata syntax or matched anything was anybody’s guess, though! We amassed a lot of brand new tech debt real quick.

Now, if there’s something I hate, it’s flying blind - I tend to work very quickly, but only because I know I have a safety net that will tell me when I screw up. But this was the opposite of a safety net - it was like walking a tightrope over the lion cages. One wrong move, and I was lunch.

We very quickly built out procedures and infrastructure to help keep Suricata happy and the lions hungry; in no time, we improved our process greatly:

  • We started developing via GitHub PRs, which enabled peer-reviews (and let us use syntax-highlighting editors instead of a textbox)
  • We built a testing pipeline that validated the rule’s syntax (both locally and again when we PR it)
  • We started amassing an enormous collection of exploit traffic, which let us write unit tests to validate that the rule matches what it’s supposed to match (we have something like 17,000 PCAPs now, covering most of our tags, and the number increases every day)
  • We built a Linter that has over 100 different rules for every foreseeable problem, which also runs both locally and via PRs (everything I talk about below will be caught by our linter!)

If you’re wondering how we ended up with over 100 lint rules, it’s because I got mildly obsessed: what’s every possible way that an attacker can escape detection? What’s every way we could mess up rules? Some of them are simple–are there duplicate SIDs, do the UUIDs match, are we including links to our bug tracker, etc.–but an awful lot of them are based on common bypasses that we’ve seen and/or developed ourselves.

I got so excited, in fact, that I wrote what ended up being a full-day workshop that I gave in two hours at NorthSec 2025, and hope to give again in all its glory! Stay tuned for that!

URL encoding

We’ll start with a quick primer on how URL encoding works, including the different (slightly conflicting) RFCs.

There’s an RFC on how URIs work called RFC 3986: Uniform Resource Identifier (URI): Generic Syntax. What we care about is Section 2: Characters.

It exists because URIs–that is, the thing you type into your browser to visit webpages–sometimes need to carry both data and metadata (like parameter names and parameter values). A GET request can, in theory, contain any character (even unprintable characters and characters that would make the request ambiguous), but also must be displayed in your browser’s address bar. To fix that, the standard includes “hex encoding”.

For example, let’s say you have a form with fields called “name” and “job”. You might see a request like this:

  • https://example.org/page.php?name=ron&job=researcher

And that would be just fine. Each argument is name=value, and they’re separated by &. But what if I wanted to use the name ron&chris? It’d look like:

  • https://example.org/page.php?name=ron&chris&job=researcher

See how that’d be ambiguous? Now it seems like chris is his own field, which I don’t think he’d appreciate. And don’t get me started on, like, newlines - arguments can absolutely contain newlines, but your browser search bar only has one line! What do you do?

That’s where URI encoding comes into play. If you convert the ASCII representation of & to hex, you get 0x26. In a URI, you’d use %26. Likewise, a newline is %0d or %0d%0a, a NUL byte would be %00, and so on. Using encoding, the request would be:

  • https://example.org/page.php?name=ron%26%chris&job=researcher

Most web servers will decode the arguments before handling them back to the application, so the application developers never really have to understand what’s going on.

Every language has libraries to support this, but I’m a Ruby nerd, and in Ruby, you can mess around with encoding/decoding using the CGI library:

ron@ronnoise ~ $ irb -f
irb(main):001:0> require 'cgi'
=> true

irb(main):002:0> CGI.escape('ron&chris')
=> "ron%26chris"

irb(main):003:0> CGI.unescape('ron&chris')
=> "ron&chris"

One special character: space

Because every standard needs an exception, URL decoding has spaces. Logically, because the ASCII code for a space is 0x20, it should encode to %20, right? Let’s check:

irb(main):004:0> CGI.escape('ron and chris')
=> "ron+and+chris"

…wat?

Turns out that in Section 8.2.1 of RFC 1866–an earlier and more general standard–it specifies that spaces should encode to +. What’s especially fun is that the more recent RFC–RFC 3986–does NOT include that. Why? Dunno! I guess they wanted spaces to be easier to type then decided later that was confusing? Anyway, now we have both. Ambiguity!

Optional URL decoding

According to both of those RFCs, unprintable characters must be converted to the URI encoded representation; but what about the other way around?

You can actually encode every character, giving you a string such as:

irb(main):005:0> CGI.unescape('%72%6f%6e%20%61%6e%64%20%63%68%72%69%73')
=> "ron and chris"

A lot of tools–like the Metasploit Framework–will automatically and randomly encode characters to evade detection; something like:

irb(main):006:0> CGI.unescape('r%6f%6e%20%61nd+%63%68%72%69%73')
=> "ron and chris"

Most applications don’t know or care about the difference. But as detection engineers, WE do!

POST bodies

Going back to RFC 1866, it specifies that forms should be submitted with the application/x-www-form-urlencoded encoding, and that includes POST forms.

That means that POST bodies are (typically) encoded the same way as GET URIs. I say typically because there are, of course, a bunch of exceptions:

  • application/json is encoded as a JSON blob (also other markup languages like XML and YAML)
  • text/plain isn’t encoded at all
  • multipart/form-data use MIME encoding, which is altogether different
  • …and likely more

Now with that in mind, let’s look at Suricata!

Suricata

Why did I give such a long primer on URI encoding?

It’s because Suricata does some surprising stuff! And it’s in those surprising situations that bypasses happen.

A lot of what I say here is based on observation, because Suricata’s documentation is….. not good. I’ll quote it when possible, but much of this is simply not documented.

http.uri normalization

Suricata has two different sticky buffers that include just the URI–http.uri and http.uri.raw. Why two? The difference is that http.uri is normalized, and http.uri.raw isn’t. That’s actually really important, because sometimes you don’t want decoding or normalization (I could do a whole blog on http.uri.raw bypasses though).

Anyway, let’s look at http.uri. The current documentation says:

8.13.14. http.uri Matching on the HTTP URI buffer has two options in Suricata, the http.uri and the http.uri.raw sticky buffers.

[…]

The http.uri keyword normalizes the URI buffer. For example, if a URI has two leading //, Suricata will normalize the URI to a single leading /.

Normalization Example:

GET //index.html HTTP/1.1
User-Agent: Mozilla/5.0
Host: suricata.io
In this case //index.html would be normalized to /index.html.

Normalized HTTP Request Example:

GET /index.html HTTP/1.1
User-Agent: Mozilla/5.0
Host: suricata.io

Interestingly, it doesn’t mention URL encoding at all! If I go back to the most recent 7.x version, it actually does acknowledge URL encoding:

The uri has two appearances in Suricata: the uri.raw and the normalized uri. The space for example can be indicated with the heximal notation %20. To convert this notation in a space, means normalizing it. It is possible though to match specific on the characters %20 in a uri. This means matching on the uri.raw. The uri.raw and the normalized uri are separate buffers. So, the uri.raw inspects the uri.raw buffer and can not inspect the normalized buffer.

Whether this means they’ve changed the behaviour in Suricata 8, I’m not actually sure - I can’t find any discussion. I asked Perplexity AI about this, and it’s also, well, perplexed:

There is no explicit statement in the Suricata 8.0.0-rc1 documentation that http.uri is URL decoded or that percent-encoded characters are always converted to their ASCII equivalents. The documentation focuses on structural normalization and is ambiguous about percent-decoding3. The only clear mention of decoding relates to the + character and requires a configuration change2. If you need to be certain about percent-decoding behavior, you would need to consult Suricata’s source code, changelogs, or conduct empirical testing with crafted HTTP requests.

ANYWAY, the point of that whole tangent is that the documentation doesn’t really say what’s going on, and somehow says less over time.

Empirical testing

When in doubt, test! What I determined is that Suricata’s decoder performs RFC 3986 decoding (that is, %XX), but it does NOT perform RFC 1866 decoding (application/x-www-form-urlencoded), which means it does NOT decode + to <space>. The only documentation on this that I could find was on the differences from Snort page, which actually says this is optional behaviour that can be enabled:

In Snort, the http_uri buffer normalizes ‘+’ characters (0x2B) to spaces (0x20). Suricata can do this as well but you have to explicitly set query-plusspace-decode: yes in the libhtp section of Suricata’s yaml file

That means by default, if you want to match a space, you’re in a weird situation.

I know I’ve run into this a few times, often with SQL injection, but let’s use this E-cology FileDownloadForOutDocSQL SQL injection vulnerability as an example (I’m translating the POST request to GET so I can use it as an example; this likely won’t actually work):

GET /weaver/weaver.file.FileDownloadForOutDoc?isFromOutImg=1&fileid=%25d%20WAITFOR%20DELAY%20'0:0:7' HTTP/1.1
Host: {{Hostname}}

It’s injecting into a SQL query using spaces. That means you can write a Suricata rule such as:

alert http any any -> any any (msg:"GOTCHA"; flow:to_server,established; http.uri; content: "/weaver/weaver.file.FileDownloadForOutDoc"; content: "fileid="; pcre: "/fileid=[^&] /\"; sid:1080054; rev:1;)

And that should catch the attack. However, if the attacker changes the payload to:

GET /weaver/weaver.file.FileDownloadForOutDoc?isFromOutImg=1&fileid=%25d+WAITFOR+DELAY+'0:0:7' HTTP/1.1
Host: {{Hostname}}

Now it no longer decoded the + to <space>, and the rule won’t fire, but the application still sees a space and will still be exploited.

You can use the url_decode modifier to resolve this issue:

alert http any any -> any any (msg:"GOTCHA"; flow:to_server,established; http.uri; url_decode; content: "/weaver/weaver.file.FileDownloadForOutDoc"; content: "fileid="; pcre: "/fileid=[^&] /\"; sid:1080054; rev:1;)

The url_decode documentation is terse, but at least it’s specific about what it does:

Decodes url-encoded data, ie replacing ‘+’ with space and ‘%HH’ with its value. This does not decode unicode ‘%uZZZZ’ encoding

The problem now is, all of the %-encoded stuff is double-decoded, so %2527 becomes %27 during the first normalization, and %27 becomes ' during the second, and suddenly the injection is broken due to extra quotes (not that that super matters in detection). You can use http.uri.raw instead, which will only be decoded once, but now you’re dealing with unnormalized paths (like a/../b). You can combine both, but now your rule is getting unwieldy. It’s a mess!

TL;DR: Suricata’s http.uri normalization only decodes hex-encoded %XX values, not + - to decode everything, you must use url_decode!

Oh also…

Not for nothing, but you never actually need a space in a SQL query, even in the exploit examples above. You can replace spaces with tabs, newlines, or comments (/*a*/):

  • /weaver/weaver.file.FileDownloadForOutDoc?isFromOutImg=1&fileid=%25d%09WAITFOR%09DELAY%09'0:0:7' HTTP/1.1
  • /weaver/weaver.file.FileDownloadForOutDoc?isFromOutImg=1&fileid=%25d%0d%0aWAITFOR%0d%0aDELAY%0d%0a'0:0:7' HTTP/1.1
  • /weaver/weaver.file.FileDownloadForOutDoc?isFromOutImg=1&fileid=%25d/**/WAITFOR/**/DELAY/**/'0:0:7' HTTP/1.1

So realistically, if you actually want to uniquely detect a SQL injection exploit that doesn’t require quotes, you’re already sorta outta luck. Sorry!

POST bodies

The next - and simpler! - problem is POST bodies.

As I mentioned above, POST requests with the content-type application/x-www-form-urlencoded are decoded exactly the same as GET requests.

Logically, on a detection platform, POST bodies should also be decoded exactly like URIs, right? Right?

They’re not!

The http.request_body buffer isn’t decoded at all! As somebody who doesn’t read documentation, that was incredibly surprising! I had to go back and fix a lot of rules.

TL;DR: Suricata’s http.request_body isn’t normalized at all - if you’re ever matching an application/x-www-form-urlencoded payload - which is the most common by far - you need to add url_decode every single time!

Oh also…

POST bodies can have a mostly-unlimited length, and Suricata only parses part of them. I think the default is like 100kb? So if you ever want to use a POST body that evades detection, just prepend 100kb to your query and you’re good to go!

…but please don’t, that sounds really annoying for me!

Summary

We looked at some of the conflicting standards for URL encoding, as well as the conflicting ways that Suricata decodes them.

The important takeaways are:

  • If you’re trying to match a space character in an http.uri buffer, you must add url_decode
  • If you’re trying to match basically anything in an http.request_body buffer (in a standard form), you must add url_decode

And that’s my rant. See you next time!