NoiseGPT – GreyNoise Labs

With the ever-increasing sophistication of cyber threats, the need for efficient and accurate data analysis tools has never been more important. GreyNoise’s GNQL dataset offers valuable insights into the cyber threat landscape but requires users to have a deep understanding of the query language. Recognizing this barrier, we set out to create a solution that would simplify the query process and make the power of GNQL more accessible to users with varying levels of expertise. (This paragraph was written by GPT-4)

LLMs are the hot thing right now, and with the recent advances we realized we can make one part of using GreyNoise way easier: the query language. No one likes learning a new query language, and we have a ton of fields available to the user, so it can be overwhelming.

Do I use metadata.country_code or country?
What does last_seen and first_seen actually mean?
How do I query for a specific rDNS?
Where is the port field?

Using OpenAI’s api, we can give it a long set of definitions about what our fields are, how they are used, and how a query can be crafted. Then we simply forward on a plain text question/statement like “Show me results starting in Brazil and targeting the US and are on port 22” and it can format the query for us.

This service is hosted in our labs environment and can be accessed via the GraphQL api at https://api.labs.greynoise.io/1/query/playground.

To try it out see the below example:

query NoiseGPT {
        generateGNQL(input_text: "show me results starting in Brazil and targeting the US and are on port 22") {
          input_text
          queries
      }
    }

Note: Since this is an experimental service, it will be heavily rate limited.

Once your query is complete, which can take from around 2-10 seconds (and might fail) depending on OpenAI’s demand, you’ll be presented with a list of possible queries to try. As with any LLM, some might be giberish and not work, but we’ve had pretty good success so far.

Give it a shot and let us know what you think!