Usage metrics#

Starburst Enterprise platform (SEP) collects and logs information about all nodes in your cluster. This feature is mostly used for Mission Control and other management solutions. Users can also use the resulting log information for their own monitoring and operational purposes.

Configuration#

Usage metrics collection is configured by a number of properties, that you can add in etc/config.properties to configure details.

Usage metrics configuration properties#

Property name

Description

Default value

usage-metrics.log.path

Relative path of the usage log files in the data directory. For .tar. gz installations this is inside the installation folder. For rpm installations the absolute path is /var/lib/starburst/data/var/log/starburst/usage-metrics.log.

var/log/starburst/usage-metrics.log

usage-metrics.log.max-size

Maximum size of a single usage log file

100 MB, minimum 1 MB

usage-metrics.log.max-history

Maximum number of usage log files

1000

usage-metrics.cloud-watch-logs-directory

Path to the directory on the node in which log files are stored, for example, /var/log/starburst/cloudwatch. Amazon CloudWatch is automatically enabled by the SEP CFT and the usage metrics data is available in usage-metrics-* log streams.

usage-metrics.gathering.initial-delay

The initial delay to before tracking usage, allowing the cluster to start up before metrics gathering starts.

1 min

usage-metrics.gathering.interval

Length of the interval between usage metric log entry creation. Ideally, this should be set to a small value, such as the default of 1 min.

1 min

usage-metrics.gatherer-threads

Number of threads used to gather and write all usage information.

100, minimum 10

usage-metrics.cluster-usage-resource.enabled

Expose the REST API end point for usage metrics, aggregated for all nodes since cluster start, using the end point is a convenient alternative to the usage metrics parser command line tool, the coordinator exposes the result at http[s]://<coordinator_address>[:<port>]/v1/cluster/usage with a GET request.

false

Logged metrics#

For each collection interval, a JSON record is created for the entire cluster.It includes the following information:

  • startTime - the epoch time of the cluster start time for the cumulative period

  • time - the timestamp of the record

  • cumulativeCpuTime - total cumulative CPU time used by the JVM processes since the last cluster restart

  • cumulativeAvailableCpuTime - total available CPU time available since the last cluster restart

  • cores - total cores within the cluster, as the sum of all cores in the coordinator and in all registered workers

  • activeNodes - total number of the registered worker nodes and the coordinator

  • signature - the signature for the data record

The following is an excerpt of a JSON log file for a cluster as stored in a log file at usage-metrics.log.path, showing the collected data for six collection intervals:

{"startTime":1612411299895,"time":"2021-02-05T16:28:37.640Z","cumulativeCpuTime":"1141.53s","cumulativeAvailableCpuTime":"1105507.84s","cores":16,"activeNodes":1,"signature":"13794b5225a5f79a59ad5bd6ecb92b5a93ffc06521a0ac8fc21b50dffb56cfd8"}
{"startTime":1612411299895,"time":"2021-02-05T16:29:37.658Z","cumulativeCpuTime":"1142.41s","cumulativeAvailableCpuTime":"1106468.06s","cores":16,"activeNodes":1,"signature":"d2d1ecce8ac95701bd4b67caf5c69273d17107dbe93f70a2662967060587d63d"}
{"startTime":1612411299895,"time":"2021-02-05T16:30:37.668Z","cumulativeCpuTime":"1143.19s","cumulativeAvailableCpuTime":"1107428.22s","cores":16,"activeNodes":1,"signature":"c5d1f73648e575036cc6ac50cd630c1e2ee355b4d76bdc756b06c26bfdd022f3"}
{"startTime":1612543186896,"time":"2021-02-05T16:40:52.519Z","cumulativeCpuTime":"0.00s","cumulativeAvailableCpuTime":"0.00s","cores":16,"activeNodes":1,"signature":"431c9b3ef438e78ae2d5ac772461701c14470b24e4bf05638c845dce5ed6d376"}
{"startTime":1612543186896,"time":"2021-02-05T16:41:52.532Z","cumulativeCpuTime":"1.97s","cumulativeAvailableCpuTime":"960.23s","cores":16,"activeNodes":1,"signature":"700540d92bc76a2b4eb900650e1129815ce9811e7f15dfed4041119c15c84d91"}
{"startTime":1612543186896,"time":"2021-02-05T16:42:52.544Z","cumulativeCpuTime":"4.10s","cumulativeAvailableCpuTime":"1920.41s","cores":16,"activeNodes":1,"signature":"37762b6799f8afeb4eae285391e76e57110c7ad1220108dda5112b43f13ad353"}

Accessing logged metrics#

There are several ways to access usage metrics data, using Starburst visualization tools, or through more manual methods.

Starburst Insights#

We highly recommend enabling and using the Starburst Insights interface for the best and most comprehensive user experience. Insights is accessed in the same way as the Starburst Web UI, and has built-in aggregations and visualizations. Insights accesses a much richer, more comprehensive data set that includes event logger data as well as usage metrics data.

Usage metrics parser#

The usage metrics parser is a command line application available separately from Starburst Support that aggregates and returns usage metrics from log files. To install the tool, place the JAR file in a convenient directory, and add it to your PATH. Then, change the filename and permissions as follows:

$ mv starburst-usage-metrics-parser-*-executable.jar usage-metrics-parser
$ chmod a+x usage-metrics-parser

Because the usage-metrics-parser is installed separately, it is not available on the nodes where the logs are saved. After the tool is installed locally, copy and the log files from usage-metrics.log.path to a local empty folder, and use the parser with appropriate arguments:

$ usage-metrics-parser [--from <from>] [(-h | --help)] [--to <to>] [--] [<path>]
  • <from> - DateTime value for the beginning of the range, inclusive

  • <to> - DateTime value for the end of the range, exclusive

  • <path> - the fully-qualified path to the folder containing the log files

The usage metrics parser can be run directly, or with the java command.

# run directly:

$ ./usage-metrics-parser

# run using the java command:

$ java -jar usage-metrics-parser

The begin and end DateTime options accept the following formats:

DateTime formats for date range#

Description

Syntax

datetime

time | date-opt-time

time

‘T’ time-element [offset]

date-opt-time

date-element [‘T’ [time-element] [offset]]

date-element

std-date-element | ord-date-element | week-date-element

std-date-element

yyyy [‘-‘ MM [‘-‘ dd]]

ord-date-element

yyyy [‘-‘ DDD]

week-date-element

xxxx ‘-W’ ww [‘-‘ e]

time-element

HH [minute-element] | [fraction]

minute-element

‘:’ mm [second-element] | [fraction]

second-element

‘:’ ss [fraction]

fraction

(‘.’ | ‘,’) digit+

offset

‘Z’ | ((‘+’ | ‘-‘) HH [‘:’ mm [‘:’ ss [(‘.’ | ‘,’) SSS]]])

If you do not specify either of the --from or --to options, the metrics are computed using all available log file entries, as in the following example:

$ usage-metrics-parser /Users/jsmith/tmp

Specifying both --from and --to DateTimes computes metrics inclusive of the start, and exclusive of the specified end. In the following example, data for 2021-02-10T16:59:59Z will be included, and 2021-02-10T17:00:00Z is excluded:

$ usage-metrics-parser --from 2021-02-05T05:00:00Z --to 2021-02-10T17:00:00Z /Users/jsmith/tmp

Specifying --from with no end time computes metrics from that time up to the last log entry, inclusive:

$ usage-metrics-parser --from 2021-02-05T05:00:00Z /Users/jsmith/tmp

Specifying --to with no start time results in metrics computed from the earliest available log entry until the specified end time, exclusive:

$ usage-metrics-parser --to 2021-02-10T17:00:00Z /Users/jsmith/tmp

No matter what the specified date range is, usage-metrics-parser displays the results in the following format:

cluster restarts: 1, cpu time: 265.55s, available cpu time: 93130.57s, cpu utilization: 0.29%, min cores: 16, max cores: 16
  • cluster restarts - count of unique startTime in the date range

  • cpu time - aggregated cumulativeCpuTime in the specified date range

  • available cpu time - aggregated cumulativeAvailableCpuTime in the specified date range

  • cpu utilization - (cpu time / available cpu time) * 100

  • min cores - the smallest value of activeNodes in the records in specified time period

  • max cores - the largest value of activeNodes in the records in specified time period

Persisting metrics#

Understanding the usage of your cluster requires persisting the usage metrics. By default the metrics are not persisted.

Persisting metrics with Starburst Insights#

The easiest way to persist and view comprehensive metrics is by enabling the Starburst Insights interface, and ensuring that the insights.persistence-enabled configuration property is set to true in the the config properties file. See the Starburst Insights section for complete configuration requirements.

Persisting using Kubernetes with Helm#

While usage metrics are enabled by default when deploying SEP using Kubernetes with Helm, you must enable a persistent volume to persist your usage metrics.

Persisting with Amazon CloudWatch#

The CFT setup automatically configures persisting the metrics in Amazon CloudWatch. Users of Amazon EKS can also use CloudWatch as well.

Logs exported from CloudWatch are decorated and have to be pre-processed to limit information to the raw format displayed in the preceding section to process it with the metrics parser.

Log management solutions integrated with other cloud platforms, or available separately, allow persisting the metrics by capturing the log file or regularly inspecting the REST endpoint.

You can read more about how Starburst integrates with CloudWatch metrics in our AWS documentation.