internal > Personas used for documentation and design

Personas used for documentation and design #

There are three primary audiences we write and design for:

Starburst platform administrators
Data engineers
Data consumers: analysts & scientists

There is a single, secondary audience: “Data leaders.” This persona is largely a check writer for our purposes (VP, CDO, CIO), and does not actually use SEP any differently than the primary personas. They are included here for completeness, and should be kept in mind when creating marketing content such as case studies, white papers and ROI-focused materials.

This document describes these audiences as personas - fictional amalgamations - that you can empathize with and solve problems on behalf of. In practice, product personas are given names, faces and backgrounds to aid in discussing their needs as if they were real people, representative of our customers.

Primary personas #

Data consumers - analysts and scientists #

Data analysts and scientists will approach Starburst in very similar ways. However, their backgrounds and skill sets are different, so we will separate those out. Their pain points will be treated together.

Chris Consumer
Early career
BSc in physics from University of Toledo, currently working on online MBA

Chris is a Business Analyst. He's responsible for delivering visualizations and reports to ensure that his leadership is making well-informed, data-driven decisions. Chris cares very deeply that not only are the right questions being asked (and answered), but that the right data is being used to answer the questions. With the wealth of data available, it can be easy to overlook and misuse data. The quality of Chris's work ultimately rests on the quality and reliability of the data he uses, so Chris keeps good working relationships with his data engineering team and often communicates discrepancies and SLAs issues to them. Chris has some solid SQL chops and is often able to prototype a new data source to be productionalized by data engineers. Chris feels that he has just the right combination of technical skills and business acumen.

When you write, design and build for Chris, here are some of the skill sets you can expect him to have:

A reasonable level of skill with SQL, with some knowledge of more advanced queries
Limited programming skills and methodologies
Tells stories with data
Expert with data visualization tools
Excellent spreadsheet skills, including some modeling
A good ability to detect and articulate issues with data, even if he cannot remedy them or trace the cause

Cameron Consumer
Early career
PhD in statistics, Stanford

Cameron is a data scientist. She's responsible for creating data models and that forecast and describe the business. Cameron worries about the impact of seasonality on sales, and feels compelled to deliver models that reflect that impact with a high degree of accuracy. Cameron feels like she brings the answers to "Why?" and "How?" to the table. Her machine learning models help her find the levers that the business can pull - the "how," and her models account for why the business behaved as it did, or will. She feels more like an academic than an engineer, and is very proud of her scientific approach to business. Her digital sales data knowledge is formidable, and her reputation as an SME ensures that she has a robust stream of opportunities in her field.

When you write, design and build for Cameron, here are some of the skill sets you can expect her to have:

A reasonable level of SQL skills, with some knowledge of more advanced queries
Reasonable programming skills
Expert in statistical methods and/or machine learning
Some understanding of code repositories
Competence with data visualization tools
A good ability to detect and articulate issues with data, even if they cannot remedy them or trace the cause

Cameron’s and Chris’s pain points include, in no particular order:

Having to retrofit tools onto multiple data sources
Long waits for ETL to deliver useable data
Can’t dive into data quality issues
Data engineers sometimes kill their queries because of resource contention
Complex, periodic reports and models are often delayed past due dates

Data engineers #

Donna Data Engineer
Mid-career
BSc in computer engineering, University of Illinois at Chicago

Donna Data Engineer is responsible for designing performant data sources that can answer a broad range of business questions at XYZ, Inc. Donna found her way to data engineering through internships in college; it felt like a good blend between the technical chops required for programming jobs, and the big picture, organizational nature of data that she is naturally drawn to. As part of her job, she must understand what data is currently available from what sources, and what new data is needed to fill in any gaps. Donna has to work with stakeholders to source that new data, be it from third parties or through new log entries, message streams or product endpoints. Donna works pretty closely with data analysts and scientists, and tries to anticipate their needs in order to keep up with burgeoning data demands.

Daniel Data Engineer
Mid-career
BSc in computer science, University of New Hampshire

Daniel Data Engineer is responsible for delivering data to data analysts and data scientists at Acme Corp. Up until a few years ago, this mostly entailed writing complex ETL in frameworks such as Informatica and Alteryx. Over the last few years, he's worked mostly in python-based frameworks such as Airflow and Bonobo as well as diving into Apache Spark. Daniel really cares about data landing times, because him and his coworkers hear from PagerDuty way more than they would like to.

When you write, design and build for Donna and Daniel, here are some of the skill sets you can expect them to have:

Creating and monitoring pipeline health metrics to ensure SLAs are met.
Enabling automated self-service pipelines using Infrastructure as Code (IaC)
Design schemas, data lake and data warehouse solutions in collaboration with stakeholders.
Building and managing Kafka-based streaming data pipelines
Building and managing Airflow- and Spark-based ETLs
Creating and updating data models & data schemas that reduce system complexity and cost, and increase efficiency
Preparing and cleaning data for prescriptive and predictive modeling and descriptive analytics
Identifying, designing, and implementing internal process improvements such as automating manual processes, optimizing data delivery for greater scalability
Creating data tools for analysts and data scientists
Building data integrations between various 3rd party systems such as Salesforce and Workday

Donna’s and Daniel’s pain points, in no particular order:

Keeping up with the changing landscape of data delivery technology
Managing SLAs for data pipelines in environments where the data growth rate and complexity constantly increases, data pipeline and platform performance
Aligning and negotiating with upstream data sources and infrastructure SLA owners
Sussing out detailed data requirements from folks with a wide range of data knowledge
Long, brittle pipelines
Productionalizing non-performant analyst queries
Constantly responding to resource constraint issues
Designing ETL around siloed data
Data cleansing

Platform administrators #

Art Administrator
Late career
BSc in computer science, BYU

Art Administrator is responsible for XYZ, Inc's Starburst cluster. He was an SRE for the data team for years, and switched roles to platform engineering after leading the SREs for a bit. Art really cares about scalability and reliability, especially since XYZ has super aggressive SLAs both on data landing times and of course availability. Art works closely with his colleagues in IT to ensure that his systems adhere to XYZ's strict access policies and support audit requirements.

Ada Administrator
Late career
BSc in computer science, University of Washington

Ada Administrator is responsible for both Acme Corp's Starburst and Postgres clusters. Ada was a DBA from early to mid-career, and it fell to her at Acme to figure out the HDFS ecosystem when it came along. Now she builds and maintains big data clusters for a living. Ada cares a lot about the using right data platform for the data.

When you write, design and build for Art and Ada, here are some of the skill sets you can expect them to have:

Building and maintaining scalable data platform architectures to support the ingest, storage and querying of large heterogenous datasets
Creating and monitoring cluster health metrics to ensure optimal performance and reduce any downtime
Writing clean, production-ready code (in Java, Go etc.) with a strong focus on quality, scalability and high performance
Using and building scalable asynchronous REST API’s
Working with cloud providers like AWS, Azure and Google Cloud
Implementing and working with persistence technologies like AWS S3, HDFS, Kafka and ElasticSearch
Designing for data integrity and security through all environments as well as the data lifecycle
Partnering with data engineers to enable automated self-service pipelines using Infrastructure as Code (IaC)
Partnering with data engineers to design and improvement schemas, data lake and data warehouse solutions in collaboration with stakeholders

Art’s & Ada’s pain points, in no particular order:

Sorting through an overload of information to master complex data platforms
Ensuring data platforms can scale to demand and with growth
Architecting solutions that can provide disaster recovery and business continuity for complex, critical data systems, in conjunction with IT stakeholders
Assisting in managing budgets and licensing cycles for massive enterprise-scale software vendors, bandwidth and hardware leases
Constantly tackling inherently complex and highly-visible tasks
Delivering against stringent infrastructure SLAs
Doing more with less, or at least the same team size
Implementing data governance requirements for all data systems

Secondary persona - data leader #

Lauren Leader
Mid-to-late career
MBA, Haas School of Business

Lauren is CIO at the newly IPO'ed Clouds 'R Us. She's responsible for data infrastructure, data governance and delivery, as well as enabling SOX, GDPR and CCPA compliance. Prior to stepping into her current role, Lauren was a VP of IT at Acme Corp., where she owned the budget for all data infrastructure. She calls this her "real-life MBA," because she learned the hard way from being caught off-guard by explosive growth in under-specified legacy systems in multiple budget cycles. Lauren is also sensitive to scaling, platform lock-in, and staffing around particular technologies.

When you write for Lauren, here are some of her pain points to keep in mind, in no particular order:

Constantly fighting Shadow IT, up to and including small, narrow-scope one-off data warehouse solutions which she inevitably must absorb
Architecting around legacy systems, particularly monolithic services
Changing regulatory climate
Staffing for innovation while keeping legacy systems running and trying to automate
Managing, defending and demanding a budget with rapid growth
Balancing buy-vs-build, including for contracting services
Balancing private cloud vs hosted cloud solutions for cost-effectiveness, regulatory compliance and security

Is the information on this page helpful?

Yes

Cancel

Personas used for documentation and design
- Primary personas
- Secondary persona - data leader