Starburst Enterprise in Google Cloud Marketplace#

Starburst Enterprise platform (SEP) is available directly through the Google Cloud Platform Marketplace to run on a variety of instance types. Our Google Cloud Marketplace offering allows you to easily set up a monthly contract, after which you can deploy SEP using the command line or Google’s Click to Deploy on Google Kubernetes Engine (GKE).

Deployment options#

We strongly recommend that you deploy using the command line. Google’s Click to Deploy option is best suited for small proofs-of-concept and limits your customization options.

After you have deployed using the command line, you can customize SEP including adding catalogs and services such as:

  • Hive Metastore (HMS)

  • Apache Ranger

  • Starburst Cache Service

Marketplace support#

Starburst Enterprise offers the following support for our marketplace subscribers:

  • Email-only support

  • Five email issues to gcpsupport@starburstdata.com per month

  • First response SLA of one business day

  • Support hours between 9 AM - 6 PM US Eastern Time

Set up your subscription#

Before you begin, you must have a Google Cloud login with the ability to subscribe to services.

Note

In the following steps, if the blue button says Purchase instead of Configure, the Google Cloud account you are using is not setup for subscription billing.

To subscribe to SEP through the Google Cloud Marketplace:

  1. Log in with your billable subscriber account and access the Starburst Enterprise offering directly, or enter “Starburst Enterprise” in the marketplace search field and select Starburst Enterprise - Distributed SQL Query Engine.

  2. Click Configure.

  3. On the resulting screen, select Deploy via command line. Click to Deploy on GKE is not supported.

Set up the GKE cluster#

Create a GKE Standard cluster with two nodepools. One nodepool is for SEP while the other is for Ranger and HMS, so if you don’t need these additional services then the default nodepool for SEP is sufficient. The following are the recommended minimal specifications to use for a proof-of-concept deployment:

Cluster name: my-sep-cluster
Location type: zonal (lower latency)
K8s version: 1.20.9-gke.1001
Primary nodepool name: default-nodepool
   Number of nodes: 3
   Machine configuration: e2-standard-16 (16 CPU and 64 GB RAM)
Supplementary nodepool name: nonsep
   Number of nodes: 1
   Machine configuration: e2-standard-8 (8 CPU and 32 GB RAM)

SEP license requirements#

SEP on Google Cloud is licensed with a pay-as-you-go (PAYGO) model.

PAYGO requires the metering agent and the billing reporter agent, both of which are preconfigured. You must pass the reference to your unique reporting secret. No license file is embedded. Instead, the license must be created first, then the secret must be referenced in the values.yaml using the starburst-enterprise.starburstPlatformLicense configuration property.

Deploy from Google Cloud Marketplace#

The following CLI deployment sections describe how to deploy SEP with the Google Cloud Marketplace.

Get Marketplace license file#

  1. Navigate to the Google Cloud Marketplace SEP offering and select Configure.

  2. Set App instance name and switch to the Deploy via command line tab.

  3. Select the appropriate reporting/cluster service account and click Generate license key.

  4. Apply the downloaded license file with the following command:

    $ kubectl apply -f license.yaml
    
  5. Confirm the starburst-enterprise-license-<unique_suffix> reporting secret has been created:

    $ kubectl get secrets | grep starburst-enterprise-license
    $ kubectl describe secret starburst-enterprise-license-<unique_suffix>
    
  6. Record the license name that was loaded into your cluster for later use. For example:

    Name: starburst-enterprise-license-121212
    Namespace: default
    Labels: <none>
    Annotations: <none>
    Type: Opaque
    

Get the umbrella Helm Chart#

The version of the Helm Chart is 3.7.3. The associated Starburst Enterprise version is 443.3.0.

Note

Do not extract the contents of the archive downloaded in these steps.

  1. Download the chart as shown:

    $ wget https://storage.googleapis.com/starburst-enterprise/helmCharts/sep-gcp/starburst-enterprise-platform-charts-3.7.3.tgz
    
  2. Apply Application CRD to avoid errors:

    $ kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"
    

Check the deployment files#

  1. Check the content of the default values distributed with the umbrella Helm chart:

    $ helm show values starburst-enterprise-platform-charts-3.7.3.tgz
    
  2. Check the default values.yaml for subcharts to ensure that they are using the Helm chart version. The correct Helm chart version matches the number found in the marketplace listing page in the Tag field. (Note: Tag numbers may omit the revision number.) For example, if version 3.7.3 uses Helm charts in version 443.3.0, then for Starburst Enterprise Helm chart run:

    $ helm show values starburstdata/starburst-enterprise --version 443.3.0
    

    The same command can be used to print the values.yaml for Starburst Hive and Ranger Helm charts.

Build the values.yaml file#

The values.yaml file contains the configuration for your SEP cluster. You can either use the default content for a basic configuration to start, or build the content yourself for a more custom initial deployment.

Deploy a basic SEP cluster#

Deploy the default values.yaml content for a basic SEP cluster configuration. This YAML requires the starburst-enterprise.reportingSecret value to deploy an SEP cluster

$ helm install starburst-enterprise starburst-enterprise-platform-charts-3.7.3.tgz --set starburst-enterprise.reportingSecret=starburst-enterprise-license-<unique_suffix>

To deploy with support for Starburst Warp Speed, you must append a secret with an SEP license key to: --set starburst-enterprise.starburstPlatformLicense=<secret_with_sep_license>.

Deploy a custom SEP cluster#

Create a values.yaml file with the configuration you wish to deploy, following the license requirements for your license type. You can include catalog configuration for your data sources at this time, or add them later.

Example

Copy the below example template and overwrite the defaults with values specific to Google Cloud marketplace:

# Starburst Enterprise chart
starburst-enterprise:
  reportingSecret: ENTERPRISE_LICENSE_NAME

  catalogs:
    bigquery: |
      connector.name=bigquery
      bigquery.project-id=GOOGLE_PROJECT_ID
    hive: |
      connector.name=hive
      hive.metastore.uri=thrift://hive:9083
    starburst-insights: |
      connector.name=postgresql
      connection-url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
      connection-user=postgres
      connection-password=INSIGHTS_DATABASE_PASSWORD

  coordinator:
    additionalProperties: |
      insights.persistence-enabled=true
      insights.metrics-persistence-enabled=true
      insights.jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
      insights.jdbc.user=postgres
      insights.jdbc.password=INSIGHTS_DATABASE_PASSWORD
      insights.authorized-users=.*
    etcFiles:
      properties:
        config.properties: |
          http-server.authentication.allow-insecure-over-http=true
          http-server.process-forwarded=true
        password-authenticator.properties: |
          password-authenticator.name=file
    nodeSelector:
      starburstpool: STARBURST_COORDINATOR_NODE_POOL
    resources:
      limits:
        memory: 56Gi
      requests:
        cpu: 15
        memory: 56Gi

  expose:
    type: clusterIp
    ingress:
      serviceName: starburst
      servicePort: 8080
      host: STARBURST_URL
      path: "/"
      pathType: Prefix
      tls:
        enabled: true
        secretName: tls-secret-starburst
      annotations:
        kubernetes.io/ingress.class: nginx
        cert-manager.io/cluster-issuer: letsencrypt

  userDatabase:
    enabled: true
    users:
    - password: ADMIN_PASSWORD
      username: ADMIN_USERNAME

  worker:
    autoscaling:
      enabled: true
      maxReplicas: 10
      minReplicas: 1
      targetCPUUtilizationPercentage: 80
    deploymentTerminationGracePeriodSeconds: 30
    nodeSelector:
      starburstpool: STARBURST_WORKER_NODE_POOL
    resources:
      limits:
        memory: 56Gi
      requests:
        cpu: 15
        memory: 56Gi
    starburstWorkerShutdownGracePeriodSeconds: 120

# Hive Chart
starburst-hive:
  enabled: true

  gcpExtraNodePool: EXTRA_NODE_POOL

  database:
    external:
      driver: org.postgresql.Driver
      jdbcUrl: jdbc:postgresql://HIVE_DATABASE_INSTANCE:5432/hive
      user: postgres
      password: HIVE_DATABASE_PASSWORD
    type: external

  objectStorage:
    gs:
      cloudKeyFileSecret: service-account-key

  expose:
    type: clusterIp

# Ranger Chart
starburst-ranger:
  enabled: true

  admin:
    resources:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
    serviceUser: ADMIN_USERNAME

  gcpExtraNodePool: EXTRA_NODE_POOL

  usersync:
    enabled: true

  database:
    external:
      databaseName: ranger
      databasePassword: RANGER_DATABASE_PASSWORD
      databaseRootPassword: RANGER_DATABASE_ROOT_PASSWORD
      databaseRootUser: postgres
      databaseUser: ranger
      host: RANGER_DATABASE_INSTANCE
      port: 5432
    type: external

  datasources:
  - host: coordinator
    name: starburst-enterprise
    password: ADMIN_PASSWORD
    port: 8080
    username: ADMIN_USERNAME

  expose:
    type: clusterIp
    loadBalancer:
      name: ranger
      ports:
        http:
          port: 6080
    ingress:
      serviceName: ranger
      servicePort: 6080
      host: RANGER_URL
      path: "/"
      pathType: Prefix
      tls:
        enabled: true
        secretName: tls-secret-ranger
      annotations:
        kubernetes.io/ingress.class: nginx
        cert-manager.io/cluster-issuer: letsencrypt

  initFile: files/initFile.sh

Confirm that you have set the following placeholder entries in the above yaml to match your environment and configuration. You can also include any other configuration required, such as SSO, LDAP, Ingress and custom catalogs to this file:

  • ENTERPRISE_LICENSE_NAME - The name of the license in license.yaml that was uploaded using kubectl.

  • GOOGLE_PROJECT_ID - The Google Project you are deploying to.

  • INSIGHTS_DATABASE_INSTANCE - Hostname/IP for the ‘Starburst Insights’ database instance.

  • INSIGHTS_DATABASE_PASSWORD - postgres user password for the Insights database.

  • STARBURST_COORDINATOR_NODE_POOL - Node Pool for the Coordinator (optional).

  • STARBURST_WORKER_NODE_POOL - Node Pool for the worker nodes. Can be the same as Coordinator (optional).

  • EXTRA_NODE_POOL - Node pool for Ranger and Hive (optional).

  • ADMIN_USERNAME - Starburst Enterprise web UI login user.

  • ADMIN_PASSWORD - Starburst Enterprise web UI login password.

  • HIVE_DATABASE_INSTANCE - Hostname/IP for the Hive database instance.

  • HIVE_DATABASE_PASSWORD - postgres user password for the Hive database.

  • RANGER_DATABASE_INSTANCE - Hostname/IP for the Ranger database instance.

  • RANGER_DATABASE_ROOT_PASSWORD - postgres user password for the Ranger database.

  • RANGER_DATABASE_PASSWORD - ranger user password for the Ranger database.

Run the Helm deployment#

After you have configured the values file for your environment, run the following:

$ helm install starburst-enterprise starburst-enterprise-platform-charts-3.7.3.tgz --values values.yaml

Validate your deployment#

  1. You can verify that all pods are in a running state or a completed state:

   $ kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE                                             NOMINATED NODE   READINESS GATES
coordinator-6f646d7996-dttk4   4/4     Running   0          25m   10.104.0.19   gke-test-mp-cluster-default-pool-5b89684f-6g3b   <none>           <none>
hive-7c8b5b5495-v9gwz          2/2     Running   0          25m   10.28.0.27    gke-test-mp-cluster-nonsep-0be06378-b718         <none>           <none>
ranger-7c6b59bdd5-b9v8s        2/2     Running   0          25m   10.28.0.28    gke-test-mp-cluster-nonsep-0be06378-b718         <none>           <none>
worker-76598b766c-dvgd7        3/3     Running   0          25m   10.104.0.18   gke-test-mp-cluster-default-pool-5b89684f-6g3b   <none>           <none>
worker-76598b766c-pqg9p        3/3     Running   0          25m   10.104.0.17   gke-test-mp-cluster-default-pool-5b89684f-6g3b   <none>           <none>
  1. After deployment, confirm that the metrics reporter is able to submit metrics to Google Cloud metering service:

    $ kubectl logs deployment/coordinator -c metrics-reporter
    2022-11-17 08:57:01 INFO     Starting a new job with Billing Handler...
    2022-11-17 08:57:01 INFO     Trying to get usage metrics from coordinator...
    2022-11-17 08:57:01 ERROR    Failure to get usage metrics from the coordinator because of metrics usage unavailability: Expecting value: line 1 column 1 (char 0)
    2022-11-17 08:57:01 WARNING  The coordinator failed to respond. Make sure it is up and running! Exiting...
    2022-11-17 08:57:01 INFO     Done
    2022-11-17 08:58:02 INFO     Starting a new job with Billing Handler...
    2022-11-17 08:58:02 INFO     Trying to get usage metrics from coordinator...
    2022-11-17 08:58:02 INFO     Number of cores in this 60 second cycle: 45
    2022-11-17 08:58:02 INFO     Report submission status: 200 : OK :
    2022-11-17 08:58:02 INFO     Trying to get status from the Billing agent...
    2022-11-17 08:58:02 INFO     Status response from the Billing agent: {"lastReportSuccess":"2022-11-17T08:56:48.771289323Z","currentFailureCount":0,"totalFailureCount":0}
    2022-11-17 08:58:02 INFO     Done
    

    Every minute you should see:

    • Report submission status: 200 : OK

    • Number of cores in this 60 second cycle: 45:

    • 1 coordinator * 15 vCPUs + 2 workers * 15 vCPUs = 45 vCPUs in total

  2. Please also verify that there are no errors reported by ubbagent:

   $ kubectl logs deployment/coordinator -c ubbagent
   Listening locally on port 6080
   I1117 08:56:48.616602       1 servicecontrol.go:88] ServiceControlEndpoint:Send(): serviceName:  starburst-presto.mp-starburst-public.appspot.com  body:  {"operations":[{"consumerId":"project:pr-a19b90ff70ab666","endTime":"2022-11-17T08:56:48Z","metricValueSets":[{"metricName":"starburst-presto.mp-starburst-public.appspot.com/cpu_usage_in_seconds_pricing","metricValues":[{"endTime":"2022-11-17T08:56:48Z","int64Value":"0","startTime":"2022-11-17T08:56:48Z"}]}],"operationId":"71f2590e-27b6-4be0-9720-eee5918e4c00","operationName":"starburst-presto.mp-starburst-public.appspot.com/report","startTime":"2022-11-17T08:56:48Z","userLabels":{"goog-ubb-agent-id":"b33d9a75-56c8-4965-92ec-644346960142"}}]}
   I1117 08:56:48.771219       1 servicecontrol.go:112] ServiceControlEndpoint:Send(): success
   I1117 08:58:02.425575       1 servicecontrol.go:88] ServiceControlEndpoint:Send(): serviceName:  starburst-presto.mp-starburst-public.appspot.com  body:  {"operations":[{"consumerId":"project:pr-a19b90ff70ab666","endTime":"2022-11-17T08:58:02Z","metricValueSets":[{"metricName":"starburst-presto.mp-starburst-public.appspot.com/cpu_usage_in_seconds_pricing","metricValues":[{"endTime":"2022-11-17T08:58:02Z","int64Value":"2700","startTime":"2022-11-17T08:58:02Z"}]}],"operationId":"e61490ef-6a1c-4aa3-8c2f-788de8b1a900","operationName":"starburst-presto.mp-starburst-public.appspot.com/report","startTime":"2022-11-17T08:58:02Z","userLabels":{"goog-ubb-agent-id":"b69d9a75-56c8-4965-92ec-644346960142"}}]}
   I1117 08:58:02.499010       1 servicecontrol.go:112] ServiceControlEndpoint:Send(): success

Next steps#

Review our Kubernetes configuration documentation:

The following pages introduce key concepts and features in SEP: