Skip to main content
Version: Next

yara

Executes YARA rules on byte streams.

Synopsis

yara [-B|--blockwise] [-C|--compiled-rules] [-f|--fast-scan] <rule> [<rule>..]

Description

The yara operator applies YARA rules to an input of bytes, emitting rule context upon a match.

We modeled the operator after the official yara command-line utility to enable a familiar experience for the command users. Similar to the official yara command, the operator compiles the rules by default, unless you provide the option -C,--compiled-rules. To quote from the above link:

This is a security measure to prevent users from inadvertently using compiled rules coming from a third-party. Using compiled rules from untrusted sources can lead to the execution of malicious code in your computer.

The operator uses a YARA scanner under the hood that buffers blocks of bytes incrementally. Even though the input arrives in non-contiguous blocks of memories, the YARA scanner engine support matching across block boundaries. For continuously running pipelines, use the --blockwise option that considers each block as a separate unit. Otherwise the scanner engine would simply accumulate blocks but never trigger a scan.

-B|--blockwise

Match on every byte chunk instead of triggering a scan when the input exhausted.

This option makes sense for never-ending dataflows where each chunk of bytes constitutes a self-contained unit, such as a single file.

-C|--compiled-rules

Interpret the rules as compiled.

When providing this flag, you must exactly provide one rule path as positional argument.

-f|--fast-scan

Enable fast matching mode.

<rule>

The path to the YARA rule(s).

If the path is a directory, the operator attempts to recursively add all contained files as YARA rules.

Examples

The examples below show how you can scan a single file and how you can create a simple rule scanning service.

Perform one-shot scanning of files

Scan a file with a set of YARA rules:

load file --mmap evil.exe | yara rule.yara
Memory Mapping Optimization

The --mmap flag is merely an optimization that constructs a single chunk of bytes instead of a contiguous stream. Without --mmap, the file loader generates a stream of byte chunks and feeds them incrementally to the yara operator. This also works, but performance is better due to memory locality when using --mmap.

Let's unpack a concrete example:

rule test {
meta:
string = "string meta data"
integer = 42
boolean = true

strings:
$foo = "foo"
$bar = "bar"
$baz = "baz"

condition:
($foo and $bar) or $baz
}

You can produce test matches by feeding bytes into the yara operator:

echo 'foo bar' | tenzir 'load stdin | yara /tmp/test.yara'

You will get one yara.match per matching rule:

{
"rule": {
"identifier": "test",
"namespace": "default",
"tags": [],
"meta": {
"string": "string meta data",
"integer": 42,
"boolean": true
},
"strings": {
"$foo": "foo",
"$bar": "bar",
"$baz": "baz"
}
},
"matches": {
"$foo": [
{
"data": "Zm9v",
"base": 0,
"offset": 0,
"match_length": 3
}
],
"$bar": [
{
"data": "YmFy",
"base": 0,
"offset": 4,
"match_length": 3
}
]
}
}

Each match has a rule field describing the rule and a matches record indexed by string identifier to report a list of matches per rule string.

Build a YARA scanning service

Let's say you want to build a service that scans malware sample that you receive over a Kafka topic malware.

Launch the processing pipeline as follows:

load kafka --topic malware | yara --blockwise /path/to/rules

If you run this pipeline on the command line via tenzir <pipeline>, you see the matches arriving as JSON. You could also send the matches via the fluent-bit sink to Slack, Splunk, or any other Fluent Bit output. For example, via Slack:

load kafka --topic malware
| yara --blockwise /path/to/rules
| fluent-bit slack webhook=<url>

This pipeline requires that every Kafka message is a self-contained malware sample. Because the pipeline runs continuously, we supply the --blockwise option so that the yara triggers a scan for every Kafka message, as opposed to accumulating all messages indefinitely and only initiating a scan when the input exhausts.

You can now submit a malware sample by sending it to the malware Kafka topic:

load file --mmap evil.exe | save kafka --topic malware

This pipeline loads the file evil.exe as single blob and sends it to Kafka, at topic malware.