Starburst Enterprise in Google Cloud Marketplace #

Starburst Enterprise platform (SEP) is available directly through the Google Cloud Platform Marketplace to run on a variety of instance types. Our Google Cloud Marketplace offering allows you to easily set up a monthly contract, after which you can deploy SEP using the command line or Google’s Click to Deploy on Google Kubernetes Engine (GKE).

Deployment options #

We strongly recommend that you deploy using the command line. Google’s Click to Deploy option is best suited for small proofs-of-concept and limits your customization options.

After you have deployed using the command line, you can customize SEP including adding catalogs and services such as:

  • Hive Metastore (HMS)
  • Apache Ranger
  • Starburst Cache Service

Marketplace support #

Starburst Enterprise offers the following support for our marketplace subscribers:

  • Email-only support
  • Five email issues to gcpsupport@starburstdata.com per month
  • First response SLA of one business day
  • Support hours between 9 AM - 6 PM US Eastern Time

Set up your subscription #

Before you begin, you must have a Google Cloud login with the ability to subscribe to services.

To subscribe to SEP through the Google Cloud Marketplace:

  1. Log in with your billable subscriber account and access the Starburst Enterprise offering directly, or enter “Starburst Enterprise” in the marketplace search field and select Starburst Enterprise - Distributed SQL Query Engine.
  2. Click Configure.
  3. On the resulting screen, select either Deploy via command line (recommended), or Click to Deploy on GKE.

Set up the GKE cluster #

  1. Reach out to Starburst Support to have your service account added to the Starburst Google Container Registry (GCR) with the Storage Object Viewer role.
  2. Create a GKE Standard cluster with two nodepools. One nodepool is for SEP while the other is for Ranger and HMS. The following are the recommended minimal specifications to use for a proof-of-concept deployment:
    Cluster name: my-sep-cluster
    Location type: zonal (lower latency)
    K8s version: 1.20.9-gke.1001
    Primary nodepool name: default-nodepool
       Number of nodes: 3
       Machine configuration: e2-standard-16 (16 CPU and 64 GB RAM)
    Supplementary nodepool name: nonsep
       Number of nodes: 1
       Machine configuration: e2-standard-8 (8 CPU and 32 GB RAM)
    

Deploy from Google Cloud Marketplace #

This CLI deployment method covers all Google Cloud listings.

Get Marketplace license file #

  1. Navigate to the Google Cloud Marketplace SEP offering and select Configure.
  2. Set App instance name and switch to Deploy via command line tab.
  3. Select the appropriate reporting/cluster service account and Generate license key.
  4. Apply the downloaded license file with the following command:
    $ kubectl apply -f license.yaml
    
  5. Confirm the secret starburst-enterprise-license has been created:
    $ kubectl describe secret starburst-enterprise-license
    
  6. Record the license name that was loaded to your cluster, you need it later for the deployment. Example:
       Name: starburst-enterprise-license-121212
       Namespace: default
       Labels: <none>
       Annotations: <none>
       Type: Opaque
    

Get Helm Chart license #

  1. Download and Extract the Chart:
    $ wget https://storage.googleapis.com/starburst-enterprise/helmCharts/sep-gcp/starburst-enterprise-platform-charts-2.3.0.tgz
    $ tar -zxvf starburst-enterprise-platform-charts-2.3.0.tgz
    
  2. Delete the existing values.yaml file bundled with the Charts. This deployment uses a custom values file:
    $ rm starburst-enterprise-platform-charts/values.yaml
    
  3. Apply Application CRD to avoid errors:
    $ kubectl apply -f "https://raw.githubusercontent.com/GoogleCloudPlatform/marketplace-k8s-app-tools/master/crd/app-crd.yaml"
    

Build the values.yaml file #

Create a values.yaml file in the current working directory, not the chart directory, with the configuration you wish to deploy. Include the catalog configuration for any data sources that the cluster needs to access in this yaml file.

Example template

Copy the below example template and overwrite the defaults with values specific to Google Cloud marketplace.

# Top level values for starburst-enterprise-platform
# Overwrite defaults with values specific to Google Cloud marketplace

deployerHelm:
  image: "gcr.io/starburst-public/starburstdata/deployer:2.3.0"

reportingSecret: ENTERPRISE_LICENSE_NAME

metricsReporter:
  image: "gcr.io/starburst-public/starburstdata/metrics_reporter:2.3.0"

imageUbbagent: "gcr.io/cloud-marketplace-tools/metering/ubbagent:latest"

starburst-enterprise:
  image:
    repository: "gcr.io/starburst-public/starburstdata"
    tag: 2.3.0
  initImage:
    repository: "gcr.io/starburst-public/starburstdata/starburst-enterprise-init"
    tag: 2.3.0
  prometheus:
    enabled: false
  catalogs:
    bigquery: |
      connector.name=bigquery
      bigquery.project-id=GOOGLE_PROJECT_ID
    hive: |
      connector.name=hive-hadoop2
      hive.allow-drop-table=true
      hive.metastore.uri=thrift://hive:9083
    starburst-insights: |
      connector.name=postgresql
      connection-url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
      connection-user=postgres
      connection-password=INSIGHTS_DATABASE_PASSWORD

  coordinator:
    additionalProperties: |
      insights.persistence-enabled=true
      insights.metrics-persistence-enabled=true
      insights.jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
      insights.jdbc.user=postgres
      insights.jdbc.password=INSIGHTS_DATABASE_PASSWORD
      insights.authorized-users=.*
    etcFiles:
      properties:
        config.properties: |
          coordinator=true
          node-scheduler.include-coordinator=false
          http-server.http.port=8080
          discovery-server.enabled=true
          discovery.uri=http://localhost:8080
          usage-metrics.cluster-usage-resource.enabled=true
          http-server.authentication.allow-insecure-over-http=true
          web-ui.enabled=true
          http-server.process-forwarded=true
        event-listener.properties: |
          event-listener.name=event-logger
          jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
          jdbc.user=postgres
          jdbc.password=INSIGHTS_DATABASE_PASSWORD
        password-authenticator.properties: |
          password-authenticator.name=file
    nodeSelector:
      starburstpool: STARBURST_COORDINATOR_NODE_POOL
    resources:
      limits:
        cpu: 15
        memory: 56Gi
      requests:
        cpu: 15
        memory: 56Gi

  expose:
    type: clusterIp
    ingress:
      serviceName: starburst
      servicePort: 8080
      host: STARBURST_URL
      path: "/"
      pathType: Prefix
      tls:
        enabled: true
        secretName: tls-secret-starburst
      annotations:
        kubernetes.io/ingress.class: nginx
        cert-manager.io/cluster-issuer: letsencrypt

  starburstPlatformLicense: sep-license

  userDatabase:
    enabled: true
    users:
    - password: ADMIN_PASSWORD
      username: ADMIN_USERNAME

  worker:
    autoscaling:
      enabled: true
      maxReplicas: 10
      minReplicas: 1
      targetCPUUtilizationPercentage: 80
    deploymentTerminationGracePeriodSeconds: 30
    etcFiles:
      properties:
        event-listener.properties: |
          event-listener.name=event-logger
          jdbc.url=jdbc:postgresql://INSIGHTS_DATABASE_INSTANCE:5432/insights
          jdbc.user=postgres
          jdbc.password=INSIGHTS_DATABASE_PASSWORD
    nodeSelector:
      starburstpool: STARBURST_WORKER_NODE_POOL
    resources:
      limits:
        cpu: 15
        memory: 56Gi
      requests:
        cpu: 15
        memory: 56Gi
    starburstWorkerShutdownGracePeriodSeconds: 120

# Hive Chart
starburst-hive:
  enabled: true

  image:
    repository: "gcr.io/starburst-public/starburstdata/hive"
    tag: 2.3.0

  gcpExtraNodePool: EXTRA_NODE_POOL

  database:
    external:
      driver: org.postgresql.Driver
      jdbcUrl: jdbc:postgresql://HIVE_DATABASE_INSTANCE:5432/hive
      user: postgres
      password: HIVE_DATABASE_PASSWORD
    type: external

  objectStorage:
    gs:
      cloudKeyFileSecret: service-account-key

  expose:
    type: clusterIp

# Ranger Chart
starburst-ranger:
  enabled: true

  admin:
    image:
      repository: "gcr.io/starburst-public/starburstdata/starburst-ranger-admin"
      tag: 2.3.0
    resources:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi
    serviceUser: ADMIN_USERNAME

  gcpExtraNodePool: EXTRA_NODE_POOL

  usersync:
    image:
      repository: "gcr.io/starburst-public/starburstdata/ranger-usersync"
      tag: 2.3.0

  database:
    external:
      databaseName: ranger
      databasePassword: RANGER_DATABASE_PASSWORD
      databaseRootPassword: RANGER_DATABASE_ROOT_PASSWORD
      databaseRootUser: postgres
      databaseUser: ranger
      host: RANGER_DATABASE_INSTANCE
      port: 5432
    type: external

  datasources:
  - host: coordinator
    name: starburst-enterprise
    password: ADMIN_PASSWORD
    port: 8080
    username: ADMIN_USERNAME

  expose:
    type: clusterIp
    loadBalancer:
      name: ranger
      ports:
        http:
          port: 6080
    ingress:
      serviceName: ranger
      servicePort: 6080
      host: RANGER_URL
      path: "/"
      pathType: Prefix
      tls:
        enabled: true
        secretName: tls-secret-ranger
      annotations:
        kubernetes.io/ingress.class: nginx
        cert-manager.io/cluster-issuer: letsencrypt

  initFile: files/initFile.sh

Confirm that you have set the following placeholder entries in the above yaml to match your environment and configuration. You can also include any other configuration required, such as SSO, LDAP, Ingress and custom catalogs to this file:

  • ENTERPRISE_LICENSE_NAME - The name of the license in license.yaml that was uploaded using kubectl.
  • GOOGLE_PROJECT_ID - The Google Project you are deploying to.
  • INSIGHTS_DATABASE_INSTANCE - Hostname/IP for the query logger database.
  • INSIGHTS_DATABASE_PASSWORD - ‘postgres’ user password for the Insights database.
  • STARBURST_COORDINATOR_NODE_POOL - Node Pool for the Coordinator.
  • STARBURST_WORKER_NODE_POOL - Node Pool for the worker nodes. Can be the same as Coordinator.
  • EXTRA_NODE_POOL - Node pool for Ranger and Hive.
  • ADMIN_USERNAME - Starburst Insights login user.
  • ADMIN_PASSWORD - Starburst Insights login password.
  • HIVE_DATABASE_INSTANCE - Hostname/IP for the Hive database instance.
  • HIVE_DATABASE_PASSWORD - postgres user password for the Hive database.
  • RANGER_DATABASE_INSTANCE - Hostname/IP for the Ranger database instance.
  • RANGER_DATABASE_ROOT_PASSWORD - postgres user password for the Ranger database.
  • RANGER_DATABASE_PASSWORD - ranger user password for the Ranger database.

Run the Helm deployment #

After you have configured the values file for your environment, run the following:

   $ helm upgrade starburst-enterprise ./starburst-enterprise-platform-charts --install --values values.yaml

Validate your deployment #

  1. You can verify that all pods are in a running state or a completed state:
    $ kubectl get pods -o wide
    NAME                                                     READY   STATUS      RESTARTS   AGE    IP           NODE                                             NOMINATED NODE   READINESS GATES
    coordinator-64cfdb94fd-v6bxv                             2/2     Running     0          4h8m   10.28.3.8    gke-test-mp-cluster-default-pool-5b89684f-6g3b   <none>           <none>
    hive-7c8b5b5495-v9gwz                                    2/2     Running     0          4h8m   10.28.0.27   gke-test-mp-cluster-nonsep-0be06378-b718         <none>           <none>
    ranger-7c6b59bdd5-b9v8s                                  2/2     Running     0          4h8m   10.28.0.28   gke-test-mp-cluster-nonsep-0be06378-b718         <none>           <none>
    starburst-enterprise-1-lic-secret-job-nfmqg              0/1     Completed   0          4h8m   10.28.2.31   gke-test-mp-cluster-default-pool-5b89684f-qfvp   <none>           <none>
    starburst-enterprise-1-metrics-reporter-9f9f5f77-7nvd2   2/2     Running     0          4h8m   10.28.2.30   gke-test-mp-cluster-default-pool-5b89684f-qfvp   <none>           <none>
    worker-76ff548b96-c2rwz                                  1/1     Running     0          4h8m   10.28.2.32   gke-test-mp-cluster-default-pool-5b89684f-qfvp   <none>           <none>
    worker-76ff548b96-wj996                                  1/1     Running     0          4h8m   10.28.1.12   gke-test-mp-cluster-default-pool-5b89684f-kfz2   <none>           <none>
    
  2. After deployment, confirm that the metrics reporter is able to submit metrics to Google Cloud metering service:
    $ kubectl logs deployment/starburst-enterprise-metrics-reporter -c metrics-reporter
    2021-09-21 09:03:02 INFO     Trying to find a coordinator service
    2021-09-21 09:03:02 INFO     Trying to find Starburst Enterprise Coordinator Deployments...
    2021-09-21 09:03:02 INFO     Trying to find Starburst Enterprise Worker Deployments...
    2021-09-21 09:03:02 INFO     Trying to get usage metrics from starburst
    2021-09-21 09:03:02 INFO     Number of cores in this 60 second cycle: 45
    2021-09-21 09:03:02 INFO     Report submission status: 200 : OK :
    2021-09-21 09:04:01 INFO     Trying to find a coordinator service
    2021-09-21 09:04:01 INFO     Trying to find Starburst Enterprise Coordinator Deployments...
    2021-09-21 09:04:01 INFO     Trying to find Starburst Enterprise Worker Deployments...
    2021-09-21 09:04:01 INFO     Trying to get usage metrics from starburst
    2021-09-21 09:04:01 INFO     Number of cores in this 60 second cycle: 45
    2021-09-21 09:04:01 INFO     Report submission status: 200 : OK :
    

    Every minute you should see:

    • Report submission status: 200 : OK
    • Number of cores in this 60 second cycle: 45:
    • 1 coordinator * 15 vCPUs + 2 workers * 15 vCPUs = 45 vCPUs in total
  3. You can also verify that there are no errors reported by ubbagent:
    $ kubectl logs deployment/$APP_INSTANCE_NAME-metrics-reporter -c ubbagent
    Listening locally on port 6080
    I0921 08:58:36.823772       1 main.go:104] Listening locally on port 6080
    I0921 09:00:02.498347       1 aggregator.go:88] aggregator: received report: cpu_usage_in_seconds_pricing
    I0921 09:00:36.823907       1 aggregator.go:197] aggregator: sending 1 report
    I0921 09:00:36.825296       1 servicecontrol.go:88] ServiceControlEndpoint:Send(): serviceName:  starburst-presto.mp-starburst-public.appspot.com  body:  {"operations":[{"consumerId":"project:pr-a19b90ff70ab335","endTime":"2021-09-21T09:00:02Z","metricValueSets":[{"metricName":"starburst-presto.mp-starburst-public.appspot.com/cpu_usage_in_seconds_pricing","metricValues":[{"endTime":"2021-09-21T09:00:02Z","int64Value":"2700","startTime":"2021-09-21T09:00:02Z"}]}],"operationId":"44a27787-b61b-4b63-80ce-75efee538f98","operationName":"starburst-presto.mp-starburst-public.appspot.com/report","startTime":"2021-09-21T09:00:02Z","userLabels":{"goog-ubb-agent-id":"28c57363-9d08-4a1e-bdbd-48a6ff2e907c"}}]}
    I0921 09:00:37.176186       1 servicecontrol.go:112] ServiceControlEndpoint:Send(): success
    I0921 09:01:02.081081       1 aggregator.go:88] aggregator: received report: cpu_usage_in_seconds_pricing
    
  4. If you need to re-deploy the entire application you may want to delete the secret with SEP license first to avoid harmless crashes of the license job:
    $ kubectl delete -f sep_manifest.yaml
    $ kubectl delete secret sep-license
    

Next steps #

Review our Kubernetes configuration documentation:

The following pages introduce key concepts and features in SEP: