This is unreleased documentation for VAST Next version.
For up-to-date documentation, see the latest version (VAST v3.1).
Version: Next
Import
Ingesting data to VAST (aka importing) involves spinning up a VAST client
that parses and ships the data to a VAST server. In the following, we assume
that you set up a server listening at localhost:5158.
Use the import command to ingest data from standard input or file:
vast import[options]<format>[options][expr]
The format defines the encoding of data.
Text formats include JSON,
CSV, or tool-specific data encodings like
Zeek. Examples for binary formats are
PCAP and
NetFlow.
For example, to import a file in JSON, use the json format:
If VAST does not ship with a module for your data out of the box,
or the inference is not good enough for your use case regarding type semantics
or performance, you can easily write one yourself.
A schema is a record type with a name so that VAST can
represent it as a table internally. You would write a schema manually or extend
an existing schema if your goal is tuning type semantics and performance. For
example, if you have a field of type string that only holds IP addresses, you
can upgrade it to type addr and enjoy the benefits of richer query
expressions, e.g., top-k prefix search. Or if you onboard a new data source, you
can ship a schema along with concept mappings for a deeper
integration.
You write a schema (and potentially accompanying types, concepts, and models) in
a module.
Let's write one from scratch, for a tiny dummy data source called foo that
produces CSV events of this shape:
message: record: -date: time -target: ip -message: msg
You can embed this type definition in a dedicated foo module:
module: foo types: message: record: -date: time -target: ip -message: msg
Now that you have a new module, you can choose to deploy it at the client or
the server. When a VAST server starts, it will send a copy of its local schemas
to the client. If the client has a schema for the same type, it will override
the server version. We recommend deploying the module at the server when all
clients should see the contained schemas, and at the client when the scope is
local. The diagram below illustrates the initial handshake:
Regardless of where you deploy the module, the procedure is the same at client
and server: place the module in an existing module directory, such as
/etc/vast/modules, or tell VAST in your vast.yaml configuration file where
to look for additional modules via the module-dirs key:
vast: module-dirs: - path/to/modules
At the server, restart VAST and you're ready to go. Or just spin up a new client
and ingest the CSV with richer typing:
For some input formats, such as JSON and CSV, VAST requires an existing schema
to find the corresponding type definition and use higher-level types.
There exist two ways to tell VAST how to map events to schemas:
Field Matching: by default, VAST checks every new record whether there
exists a corresponding schema where the record fields match. If found, VAST
automatically assigns the matching schema.
The --type=PREFIX option makes it possible to restrict the set of candidate
schemas to type names with a given prefix, in case there exist multiple
schemas with identical field names. "Prefix" here means up to a dot delimiter
or a full type name, e.g., suricata or suricata.dns are valid prefixes,
but neither suricat nor suricata.d.
Performance Boost
In case the prefix specified by --type yields exactly one possible
candidate schema, VAST can operate substantially faster. The reason is that
VAST disambiguates multiple schemas by comparing their normalized
representation, which works by computing hash of the list of sorted field
names and comparing it to the hash of the candidate types.
Selector Specification: some events have a dedicated field to indicate
the type name of a particular event. For example, Suricata EVE JSON records
have an event_type field that contains flow, dns, smb, etc., to
signal what object structure to expect.
To designate a selector field, use the --selector=FIELD:PREFIX option to
specify a colon-separated field-name-to-schema-prefix mapping, e.g.,
vast import json --selector=event_type:suricata reads the value from the
field event_type and prefixes it with suricata. to look for a
corresponding schema.