Starburst user personas #

No matter what their job title is, most Starburst users fit one of our user personas:

  • Data consumers query data through existing catalogs.
  • Data engineers create catalogs that connect data sources to Starburst.
  • Platform administrators run and maintain the Starburst cluster.

These personas embody a set of focused workflows and assume a nominal skill set. Your role may comprise some or all of one or more of these personas’ workflows, and that is ok! In fact, we can easily describe this overlap:

  • All users in the course of their job are data consumers at some level.
  • Data engineers might do a bit of platform administration.
  • Platform administrators might do a bit of data engineering.

Rather than repeat information or throw every possible workflow at every user, Starburst user guides are organized to answer a very important question: “Where do I start?!” We use personas to accomplish this.

Let’s meet the Starburst personas so that you can start with the one closest to your day-to-day workflows.

Data consumer #

Data analysts, data scientists, and casual report wranglers all use Starburst in very similar ways. In a nutshell, a data consumer focuses on one or more of the following:

  • Delivering visualizations and reports.
  • Making well-informed, data-driven decisions.
  • Creating forecast and machine learning models that describe the business.
  • Performing ad hoc analyses.

In our guides, we assume a reasonable level of skill with SQL, including some knowledge of more advanced queries, and some combination of the following:

  • Limited to reasonable programming skills
  • Excellent spreadsheet skills, including some modeling
  • Knowledge of statistical methods and/or machine learning
  • Competence with data visualization and/or reporting tools
  • Ability to detect and articulate issues with data, even if unable to remedy them or trace the cause

It’s worth noting that downstream users who only consume data through generated reports and visualizations and don’t actively query data themselves directly are direct customers of data consumers. Our documentation does not teach basic SQL skills.

Our data consumer user guide provides much information and in-depth training that covers:

  • Starburst clients
  • Query federation
  • Query optimization
  • Migrating queries to Starburst SQL

Data engineer #

Data engineers deliver data to data consumers in a performant and suitable format, and in a timely manner with an expectation of a given level of data quality. Often they source new data from a variety of relational databases, object stores, log entries, message streams or product endpoints. In a nutshell, a data engineer focuses on:

  • Creating and updating data models & data schemas.
  • Building and managing ETLs.
  • Identifying, designing, and implementing internal process improvements such as automating manual processes, optimizing data delivery for greater scalability.
  • Building and managing streaming data pipelines.
  • Building data integrations between various 3rd party systems such as Salesforce and Workday.

Data engineers are customers to platform administrators and the services they provide. Data consumers are the direct customers of data engineers.

Starburst lets data engineers decouple compute from storage and simplifies delivering the data that your users need. Our data engineer user guide will get you started with detailed information and training on topics such as:

  • Creating catalogs to connect data sources.
  • Developing custom connectors.
  • Diagnosing and fixing query performance issues.

Platform administrator #

Starburst platform administrators care about the scalability, performance and reliability of the Starburst cluster. They balance SLAs for both data landing times and availability; implement access and data governance policies; and support audit requirements. In a nutshell, platform administrators focus on:

  • Building and maintaining scalable data platform architectures to support the ingest, storage and querying of large heterogenous datasets.
  • Creating and monitoring cluster health metrics to ensure optimal performance and reduce any downtime.
  • Implementing and working with cloud providers like AWS, Azure, and Google Cloud.

Starburst runs on COTS hardware, and uses memory instead of disk, making it fast and more cost-effective. We’ve put together much in-depth information and training in our platform administrator user guide for you, covering:

  • Security
  • Performance tuning
  • Cluster setup and configuration