PyStarburst #
The PyStarburst library implements the standard Python DataFrame API, which uses a data structure called a DataFrame to analyze and manipulate two-dimensional data. Use PyStarburst to query and transform data in Starburst Galaxy clusters in a data pipeline using Python syntax.
With PyStarburst, you can create complex transformation pipelines, build data apps, and interact with data using Python without moving data to the system where your application code runs.
PyStarburst provides familiar syntax for writing and running production-grade ETL pipelines and data transformations. This makes it possible to not only build new pipelines but also to migrate existing PySpark or Snowpark workloads to Starburst Galaxy.
For additional Python support in Starburst products, visit the Python clients page.
Install the library #
To install PyStarburst and its dependencies, run the following pip
command
from your command prompt:
pip install https://starburstdata-downloads.s3.us-east-2.amazonaws.com/pystarburst/0.5.0/pystarburst-0.5.0-py3-none-any.whl
Connect to your cluster #
Use your preferred local development environment to connect to a Starburst Galaxy cluster. Establish a session using the same connection parameters you use to log into Starburst Galaxy.
Specify these settings in a dictionary that associates parameter names with
values. Then pass this dictionary to the Session.builder.configs
method and
call the create
method to establish your session:
import trino
from pystarburst import Session
db_parameters = {
"host": "<host>",
"port": <port>,
"http_scheme": "https",
"catalog": "sample",
"schema": "burstbank"
"auth": trino.auth.BasicAuthentication("<user>", "<password>")
}
session = Session.builder.configs(db_parameters).create()
To determine the values for the connection parameters host
, port
, and
user
:
- Open Partner connect in the Starburst Galaxy navigation menu.
- Click the PyStarburst tile in the Drivers and clients section.
- From the Select cluster drop-down menu, select the cluster of interest.
- Copy the values from the User, Host, and Port fields.
PyStarburst API reference #
After you have established a connection with a cluster, use Python to construct DataFrames and query tables. PyStarburst has a number of methods to perform DataFrame operations on your data.
View technical documentation for PyStarburst’s API methods at: https://pystarburst.eng.starburstdata.net/.
Example Jupyter notebook #
Try out PyStarburst using the example Jupyter notebook in the starburstdata/pystarburst-demo GitHub reop.
Is the information on this page helpful?
Yes
No
Is the information on this page helpful?
Yes
No