Starburst Splunk connector#
The Starburst Splunk connector allows querying data stored in Splunk.
To connect to Splunk, you need:
Splunk username and password credentials.
Network access from the coordinator and workers to Splunk, the default port is 8089.
A valid Starburst Enterprise license.
Create a catalog properties file in
etc/catalog named, for example,
mysplunk.properties to access Splunk as configured in the
connector.name=splunk splunk.url=https://example.splunk.com:8089 splunk.user=admin splunk.password=password splunk.schema-directory=/path/to/schema/directory
The connector can only access Splunk with the access credentials specified in the catalog configuration file:
Configure the catalog to the URL for your Splunk server hostname and management port, default 8089.
Add your Splunk user and password for your account.
If you need to access Splunk with different credentials, configure a separate catalog.
If the Splunk server is secured with TLS, you must export the server certificate
and store it on the SEP coordinator and all workers in the same path on each,
relative to the SEP
etc directory. Then, configure the
splunk.ssl-server-cert property to specify this path to the Splunk server
General configuration properties#
The following table describes general catalog configuration properties for the connector:
Support case insensitive schema and table names.
Path to a name mapping configuration file in JSON format that allows Trino to disambiguate between schemas and tables with similar names in different cases.
Frequency with which Trino checks the name matching configuration file for changes.
Duration for which metadata, including table and column statistics, is cached.
Cache the fact that metadata, including table and column statistics, is not available
Maximum number of objects stored in the metadata cache
Maximum number of statements in a batched execution. Do not change this setting from the default. Non-default values may negatively impact performance.
Push down dynamic filters into JDBC queries
Maximum duration for which Trino will wait for dynamic filters to be collected from the build side of joins before starting a JDBC query. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
Access Splunk data#
In addition to Splunk system tables, the connector can use previously saved searches to generate reports for configured users. The name of the table in SEP is the name of the report in Splunk. To create a new table, run any search in Splunk, select Save As > Report, and specify a name.
The connector only supports accessing Splunk data via saved reports.
The connector caches report names that it reads from Splunk. If you create a new
report in Splunk, it may not show up right away. You can manually reset the
metadata cache by restarting SEP or running the following
Generate a schema file#
A schema file for a Splunk report details the SEP table name, columns, and data types. If a table does not have a schema file, the connector scans up to 50 rows from Splunk to dynamically detect the types and columns to generate an internal schema. Alternatively, you can manually create a schema file or follow the example procedure to create a schema file.
When you manually create a schema file you must add it to the schema directory
on all nodes to specify the correct data types. We recommend that you use the
create_schema procedure to create the initial schema file, provide the table
name, then edit the file as necessary. The coordinator generates the file in the
configured schema directory with the name of the table and an
If you edit the file, you must copy the new version to each host in the cluster.
The following example schema file for the TPC-H
orders table was generated
by the connector. The name of the file is
orders.rsd. It must be placed in
the directory specified by the
property on every SEP node.
<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.cdata.com/ns/rsbscript/2" xmlns:other="http://apiscript.com/ns?v1"> <api:info title="orders" other:catalog="CData" other:schema="Splunk" description="null" other:earliest_time="0" other:savedsearch="true" other:tablename="orders" other:search=" from inputlookup:"orders.csv"" other:tablepath="servicesNS/admin/search/search/jobs" other:version="20"> <attr name="clerk" xs:type="string" isrequired="false" columnsize="2000" description="" other:internalname="clerk" other:filterable="false" /> <attr name="comment" xs:type="string" isrequired="false" columnsize="2000" description="" other:internalname="comment" other:filterable="false" /> <attr name="custkey" xs:type="int" isrequired="false" columnsize="4" precision="10" description="" other:internalname="custkey" other:filterable="false" /> <attr name="orderdate" xs:type="date" isrequired="false" columnsize="3" description="" other:internalname="orderdate" other:filterable="false" /> <attr name="orderkey" xs:type="int" isrequired="false" columnsize="4" precision="10" description="" other:internalname="orderkey" other:filterable="false" /> <attr name="orderpriority" xs:type="string" isrequired="false" columnsize="2000" description="" other:internalname="orderpriority" other:filterable="false" /> <attr name="orderstatus" xs:type="string" isrequired="false" columnsize="2000" description="" other:internalname="orderstatus" other:filterable="false" /> <attr name="shippriority" xs:type="int" isrequired="false" columnsize="4" precision="10" description="" other:internalname="shippriority" other:filterable="false" /> <attr name="totalprice" xs:type="double" isrequired="false" columnsize="8" precision="15" description="" other:internalname="totalprice" other:filterable="false" /> </api:info> <api:script method="GET"> <api:call op="splunkadoSelect"> <api:push/> </api:call> </api:script> </api:script>
The connector maps the following SQL types in Splunk to SEP types:
All other SQL types are not supported.
The connector includes a number of performance improvements, detailed in the following sections.
Dynamic filtering is enabled by default. It causes the connector to wait for dynamic filtering to complete before starting a JDBC query.
You can disable dynamic filtering by setting the
property in your catalog configuration file to
By default, table scans on the connector are delayed up to 20 seconds until dynamic filters are collected from the build side of joins. Using a large timeout can potentially result in more detailed dynamic filters. However, it can also increase latency for some queries.
You can configure the
dynamic-filtering.wait-timeout property in your
catalog properties file:
You can use the
dynamic_filtering_wait_timeout catalog session
property in a specific session:
SET SESSION catalogname.dynamic_filtering_wait_timeout = 1s;
The maximum size of dynamic filter predicate, that is pushed down to the
connector during table scan for a column, is configured using the
domain-compaction-threshold property in the catalog
You can use the
SET SESSION domain_compaction_threshold = 10;
domain-compaction-threshold is set to
When the dynamic predicate for a column exceeds this threshold, it is compacted
into a single range predicate.
For example, if the dynamic filter collected for a date column
dt on the
fact table selects more than 32 days, the filtering condition is simplified from
dt IN ('2020-01-10', '2020-01-12',..., '2020-05-30') to
'2020-01-10' AND '2020-05-30'. Using a large threshold can result in increased
table scan overhead due to a large
IN list getting pushed down to the data
Metrics about dynamic filtering are reported in a JMX table for each catalog:
Metrics include information about the total number of dynamic filters, the number of completed dynamic filters, the number of available dynamic filters and the time spent waiting for dynamic filters.
Starburst Cached Views#
The connector supports table scan redirection to improve performance and reduce load on the data source.