VAST v2.2

August 5, 2022 · 3 min read

Principal Engineer

We released VAST v2.2 🙌! Transforms now have a new name: pipelines. The summarize operator also underwent a facelift, making aggregation functions pluggable and allowing for assigning names to output fields.

Transforms are now Pipelines

After carefully reconsidering our naming decisions related to query execution and data transformation, we came up with a naming convention that does a better job in capturing the underlying concepts.

Most notably, we renamed transforms to pipelines. A transform step is now a pipeline operator. This nomenclature is much more familiar to users coming from dataflow and collection-based query engines. The implementation underneath hasn't changed. As in the Volcano model, data still flows through operators, each of which consumes input from upstream operators and produces output for downstream operators. What we term a pipeline is the sequence of such chained operators.

While pipelines are not yet available at the query layer, they soon will be. Until then, you can deploy pipelines at load-time to transform data in motion or data at rest.

From a user perspective, the configuration keys associated with transforms have changed. Here's the updated example from our previous VAST v1.0 release blog.

vast:
  # Specify and name our pipelines, each of which are a list of configured
  # pipeline operators. Pipeline operators are plugins, enabling users to 
  # write complex transformations in native code using C++ and Apache Arrow.
  pipelines:
     # Prevent events with certain strings to be exported, e.g., 
     # "tenzir" or "secret-username".
     remove-events-with-secrets:
       - select:
           expression: ':string !in ["tenzir", "secret-username"]'

  # Specify whether to trigger each pipeline at server- or client-side, on
  # `import` or `export`, and restrict them to a list of event types.
  pipeline-triggers:
    export:
      # Apply the remove-events-with-secrets transformation server-side on
      # export to the suricata.dns and suricata.http event types.
      - pipeline: remove-events-with-secrets
        location: server
        events:
          - suricata.dns
          - suricata.http

Summarization Improvements

In line with the above nomenclature changes, we've improved the behavior of the summarize operator. It is now possible to specify an explicit name for the output fields. This is helpful when the downstream processing needs a predictable schema. Previously, VAST took simply the name of the input field. The syntax was as follows:

summarize:
  group-by:
    - ...
  aggregate:
    min:
      - ts # implied name for aggregate field

We now switched the syntax such that the new field name is at the beginning:

summarize:
  group-by:
    - ...
  aggregate:
    ts_min: # explicit name for aggregate field
      min: ts

In SQL, this would be the AS token: SELECT min(ts) AS min_ts.

Transforms are now Pipelines​

Summarization Improvements​

Transforms are now Pipelines

Summarization Improvements