ThoughtSpot #

Requirements #

The requirements for using the Starburst data connector in ThoughtSpot Cloud:

  • Users of ThoughtSpot Cloud must connect to a cluster running Starburst Enterprise platform (SEP) 354-e or newer.

  • To take advantage of joining tables in SEP instead of in ThoughtSpot, your SEP cluster must include a Hive connector that uses a Hive metastore service.

  • Other ThoughtSpot Cloud requirements are described in the ThoughtSpot documentation.

Add a SEP connection #

Use the following steps to connect to ThoughtSpot and add your SEP cluster as a data source:

  1. Determine the connection information for your SEP cluster, including its network name, port, and your login credentials.

  2. Using a supported web browser, log into ThoughtSpot using the URL and login credentials provided for your site by your network administrators. If you are evaluating ThoughtSpot and have trial login credentials, use try.thoughtspot.cloud.

  3. To add a connection to your SEP cluster, select Data in the navigation bar.

  4. Select the Connections tab, then Add a connection.

  5. In the Choose your data warehouse page, in the first field, Name your connection, such as SEP mirror cluster. Optionally enter a connection description.

  6. Select the Starburst tile and click Continue.

      ThoughtSpot choose connection page

  7. Enter the connection details for your SEP cluster. Leave the Database field empty.

      ThoughtSpot connection details page

  8. If your SEP cluster uses secure TLS/HTTPS connections, open the Advanced Config drop-down. In the first row of Key and Value fields, enter SSL and true.

  9. Click Continue.

  10. In the Select tables page, ThoughtSpot connects to your cluster and returns a list of catalogs configured for your cluster. Open a catalog entry to see the schemas and tables it contains.

      ThoughtSpot select tables page

  11. Use the check boxes to select one or more tables on the left and columns on the right. Choose a set of tables and columns from which you can make meaningful queries by dragging column names into ThoughtSpot’s Search field.

    You can return to this page under a different connection name to select another set of tables and columns to support a different set of queries. Therefore, do not select in this named connection every table you might be interested in querying.

      ThoughtSpot select table columns page

  12. Click Confirm. This adds a named connection that includes only the tables and columns you selected.

    You can now perform a live query on this connection, following the instructions in ThoughtSpot’s documentation.

    To query a different set of tables and columns, create a different data connection to the same cluster.

Join tables in SEP #

Remember that Starburst Enterprise is not a database, it’s a SQL query engine that can connect to multiple data sources at the same time. Each SEP cluster can query multiple catalogs in a wide range of different data sources.

ThoughtSpot is capable of performing table joins when selecting tables and columns in the Select tables page. However, there is a significant speed advantage in reserving table joins for Starburst before the data reaches ThoughtSpot. Let the SEP engine’s distributed architecture do the join work.

To do this, use a SQL editor client such as Starburst Insights or DBeaver to create a view from a SELECT query that includes the joins of interest. Then select that view by name in the Select tables page like any other table.

To serve as a location to store the views, your cluster must have a Hive catalog that implements a Hive metastore service such as the AWS Glue Data Catalog. Then create your view before creating your SEP connection in ThoughtSpot.

The following example saves a view of a SELECT statement that accesses three catalogs: postgresql, hive, and sqlserver.

CREATE VIEW hive.savedviews.income_by_customer_segment_vw AS
SELECT c.custkey,
       c.estimated_income,
       c.fico,
       o.risk_appetite,
       l.cc_type
FROM glue.burst_bank.customer c
   INNER JOIN postgresql.burst_bank.customer_profile o
     ON c.custkey = o.custkey
   INNER JOIN sqlserver.burst_bank.product_profile l
     ON o.custkey = l.custkey;

Resources #