Export
Querying data from VAST (aka exporting) involves spinning up a VAST client
that executes a query expression. In the following, we assume that you set up a
server listening at localhost:5158
.
To run a query, you need to:
- know what to look for
- express what you want
- decide in what format to show the result
Let's go through each of these steps.
Decide what to query
To figure out what you can query, VAST offers
introspection via the show
command.
Use show schemas
to display the schema of all types:
vast show schemas --yaml
In case you ingested Suricata data, this may print:
- suricata.flow:
record:
- timestamp:
timestamp: time
- flow_id:
type: uint64
attributes:
index: hash
- pcap_cnt: uint64
- vlan:
list: uint64
- in_iface: string
- src_ip: ip
- src_port:
port: uint64
- dest_ip: ip
- dest_port:
port: uint64
- proto: string
- event_type: string
- community_id:
type: string
attributes:
index: hash
- flow:
suricata.component.flow:
record:
- pkts_toserver: uint64
- pkts_toclient: uint64
- bytes_toserver: uint64
- bytes_toclient: uint64
- start: time
- end: time
- age: uint64
- state: string
- reason: string
- alerted: bool
- app_proto: string
- suricata.http:
record:
- timestamp:
timestamp: time
- flow_id:
type: uint64
attributes:
index: hash
- pcap_cnt: uint64
- vlan:
list: uint64
- in_iface: string
- src_ip: ip
- src_port:
port: uint64
- dest_ip: ip
- dest_port:
port: uint64
- proto: string
- event_type: string
- community_id:
type: string
attributes:
index: hash
- http:
record:
- hostname: string
- url: string
- http_port:
port: uint64
- http_user_agent: string
- http_content_type: string
- http_method: string
- http_refer: string
- protocol: string
- status: uint64
- redirect: string
- length: uint64
- tx_id:
type: uint64
attributes:
index: hash
The next section discusses how you can refer to various elements of this type schema.
Begin with an expression
We designed the VAST language to make it easy to reference the data schema and put constraints on it. Specifically, VAST's expression language has the concept of extractors that refer to various parts of the event structure. For example, you can query the above schemas with a meta extractor to select a specific set of event types:
#type == /suricata.(http|flow)/
This predicate restricts a query to the event types suricata.flow
and
suricata.http
. You can think of the meta extractor as operating on the table
header, whereas field extractors operate on the table body instead:
hostname == "evil.com" || dest_ip in 10.0.0.0/8
This expression has two predicates with field extractors. The first field
extractor hostname
is in fact a suffix of the fully-qualified field
suricata.http.hostname
. Because it's often inconvenient to write down the
complete field name, you can write just hostname
instead. Of there exist
multiple fields that qualify, VAST builds the logical OR (a disjunction) of
all fields. This may unfold as follows:
suricata.http.hostname == "evil.com" || myevent.hostname == "evil.com" || ...
So at the end it's up to you: if you want to be fast and can live with potentially cross-firing other matches, then you can go with the "short and sweet" style of writing your query. If you need exact answers, you can always write out the entire field.
Looking at the other side of the field name, we have its type. This is where type extractors come into play. In you don't know the field name you are looking for, we still want that you can write queries effectively. Taking the above query as an example, you can also write:
:string == "evil.com" || :ip in 10.0.0.0/8
In fact, both predicates in this expression are what we call value predicates, making it possible to shorten this expression to:
"evil.com" || 10.0.0.0/8
Using type extractors (and thereby value predicates) hinges on having
a powerful type system. If you only have strings and numbers, this is not
helping much. VAST's type system
supports aliases, e.g., you can define an alias called port
that points to a
uint64
. Then you'd write a query only over ports:
:port != 443
As above, this predicate would apply to all fields of type port
—independent of
their name.
To summarize, we have now seen three ways to query data, all based on information that is intrinsic to the data. There's another way to write queries using extrinsic information: event taxonomies, which define concepts and models. Concepts are basically field mappings that VAST resolves prior to query execution, whereas models define a tuple over concepts, e.g., to represent common structures like a network connection 4-tuple. A concept query looks syntactically identical to field extractor query. For example:
net.src.ip !in 192.168.0.0/16
VAST resolves the concept net.src.ip
to all fieldnames that this concept has
been defined with. We defer to the taxonomy documentation for a
more detailed discussion.
Apply a pipeline
After providing a filter expression, you can optionally continue with a pipeline.
src_ip == 192.168.1.104
| select timestamp, flow_id, src_ip, dest_ip, src_port
| drop timestamp
Choose an export format
After your have written your query expression, the final step is selecting how
you'd like the result to be served. The export
command spins up a VAST client
that connects to a server where the query runs, and receives the results back to
then render them on standard output:
vast export [options] <format> [options] [expr]
The format defines how VAST renders the query results. Text formats include JSON, CSV, or tool-specific data encodings like Zeek. PCAP is an example for a binary format.
For example, to run query that exports the results as JSON, run:
vast export json net.src.ip in 10.0.0.0/8