Skip to main content

VAST v2.2

ยท 3 min read
Benno Evers

We released VAST v2.2 ๐Ÿ™Œ! Transforms now have a new name: pipelines. The summarize operator also underwent a facelift, making aggregation functions pluggable and allowing for assigning names to output fields.

Transforms are now Pipelinesโ€‹

After carefully reconsidering our naming decisions related to query execution and data transformation, we came up with a naming convention that does a better job in capturing the underlying concepts.

Most notably, we renamed transforms to pipelines. A transform step is now a pipeline operator. This nomenclature is much more familiar to users coming from dataflow and collection-based query engines. The implementation underneath hasn't changed. As in the Volcano model, data still flows through operators, each of which consumes input from upstream operators and produces output for downstream operators. What we term a pipeline is the sequence of such chained operators.

While pipelines are not yet available at the query layer, they soon will be. Until then, you can deploy pipelines at load-time to transform data in motion or data at rest.

From a user perspective, the configuration keys associated with transforms have changed. Here's the updated example from our previous VAST v1.0 release blog.

vast:
# Specify and name our pipelines, each of which are a list of configured
# pipeline operators. Pipeline operators are plugins, enabling users to
# write complex transformations in native code using C++ and Apache Arrow.
pipelines:
# Prevent events with certain strings to be exported, e.g.,
# "tenzir" or "secret-username".
remove-events-with-secrets:
- select:
expression: ':string !in ["tenzir", "secret-username"]'

# Specify whether to trigger each pipeline at server- or client-side, on
# `import` or `export`, and restrict them to a list of event types.
pipeline-triggers:
export:
# Apply the remove-events-with-secrets transformation server-side on
# export to the suricata.dns and suricata.http event types.
- pipeline: remove-events-with-secrets
location: server
events:
- suricata.dns
- suricata.http

Summarization Improvementsโ€‹

In line with the above nomenclature changes, we've improved the behavior of the summarize operator. It is now possible to specify an explicit name for the output fields. This is helpful when the downstream processing needs a predictable schema. Previously, VAST took simply the name of the input field. The syntax was as follows:

summarize:
group-by:
- ...
aggregate:
min:
- ts # implied name for aggregate field

We now switched the syntax such that the new field name is at the beginning:

summarize:
group-by:
- ...
aggregate:
ts_min: # explicit name for aggregate field
min: ts

In SQL, this would be the AS token: SELECT min(ts) AS min_ts.