Starburst HBase connector#
The Starburst HBase connector allows querying an external Apache HBase database.
Note
The HBase connector is a private preview. Contact Starburst support with questions or feedback.
Requirements#
To connect to HBase, you need:
Apache HBase version 2.3.x
Network access from the coordinator and workers to the HBase and Zookeeper servers. The default port is 2181.
A valid Starburst Enterprise license.
Configuration#
This section describes how to configure the HBase connector and create a catalog properties file that specifies both the HBase connector and the metadata store type.
To configure the HBase connector, create a catalog properties file that
specifies the HBase connector by setting the connector.name
to hbase
.
For example, to access a database as the example
catalog, create the file
etc/catalog/example.properties
. Replace the connection properties as
appropriate for your setup:
connector.name=hbase
hbase.zookeeper.quorum=host1,host2
hbase.zookeeper.port=2181
hbase.zookeeper.znode=/hbase
General configuration properties#
The following configuration properties can be set in the catalog properties file:
Property name |
Description |
Default value |
---|---|---|
|
Comma-separated list of ZooKeeper server hostnames. |
|
|
Server port for the ZooKeeper service. |
2181 |
|
The root znode containing all znodes created and/or used by HBase. |
|
|
Comma-separated paths to HBase configuration files for HBase clients. In the event of conflicting configuration options, the SEP connector catalog configuration overrides these files. |
|
|
HBase authentication mechanism type, either |
|
|
When set to true, enables session user impersonation. |
false |
|
When set to true, enables projection pushdown. See projection pushdown. |
false |
|
When set to true, enables |
false |
|
When set to true, enables |
false |
|
When set to true, enables |
false |
|
Enable connection pooling. When set to |
true |
|
If connection pooling is enabled, determines how long in seconds to keep a connection alive when not in active use. |
30 |
Table metadata storage options#
The HBase connector supports two types of table stores for metadata options:
HBase native table store (default)
File-based table store (legacy)
Specify the desired table store using the hbase.table-store.type
property.
HBase table store#
The HBase table store is now the default option for metadata storage in HBase.
The following example specifies the HBase table store:
hbase.table-store.type=hbase
File-based table store#
The file-based table store is the legacy option and requires .json
file
configuration. See table descriptions.
The following example specifies the file-based table store:
hbase.table-store.type=file
Table store configuration properties#
The following table store-related properties can be set in the catalog properties file:
Property name |
Description |
Default value |
---|---|---|
|
Directory path that holds JSON description files for file-based HBase tables. See table descriptions for more information. |
|
|
The store type for metadata, either |
hbase |
|
Specifies the user of the Hbase table store. |
|
|
Specifies the schema where the HBase table store is located. |
|
|
Specifies the table for the HBase table store. |
|
|
When set to true, enables the stored procedure to migrate from the
file-based table store to the HBase table store. This property requires the
|
false |
Migrating from file-based to HBase table store#
To migrate metadata from the file-based table store to the HBase native table store, enable the migration procedure using the following property:
hbase.table-store.hbase.migrate-from-file-based-table-store.enabled=true
Then execute the migration procedure:
CALL system.migrate_from_file_table_store(
SCHEMA_NAME => 'my_schema',
TABLE_NAME => 'my_table',
OVERRIDE_IF_EXISTS => true,
REMOVE_ORIGINAL_AFTER_MIGRATION => true);
After the migration is complete, metadata is stored in the HBase table store and is discoverable without the need for JSON files.
Create new metadata#
To create new metadata mapped to an existing table, use the CREATE TABLE
statement with the external_table
configuration property:
The following example creates new table metadata using the existing table and specifies columns using column options:
CREATE TABLE hbase.default.table_name (
ID BIGINT WITH (remote_name = 'ID'),
ID2 VARCHAR(4) WITH (remote_name = 'ID2'),
ID3 BIGINT WITH (remote_name = 'ID3'),
VALUE1 BIGINT WITH (family = 'CF', remote_name = 'VALUE1')
)
WITH (
external_table = true,
key_column_names = ARRAY['ID', 'ID2', 'ID3']
)
Table descriptions#
Table descriptions are required to query HBase tables from SEP. A table
description directory on all SEP nodes contains table description .json
files that describe HBase table names, columns, and data types. The connector
reads these files at runtime to create a 1:1 mapping between SEP and HBase
tables. Any HBase tables without a corresponding table description file in this
directory cannot be queried from SEP.
The table description directory must contain a subdirectory for each HBase namespace or SEP schema name, which then contain the corresponding table description files. The following shows an example directory and file structure:
table-description-dir/
namespace1/
table1.json
table2.json
namespace2/
table1.json
table2.json
Warning
Schema, table, and column names are case-insensitive. However, we recommend that
you use lowercase identifiers in table description .json
files and file names.
If the HBase database has multiple tables with the same name in different cases,
they cannot be mapped to SEP tables and therefore cause query failures.
Table description files use the following format:
Property name |
Description |
Required |
---|---|---|
|
List of the table’s columns along with their types and column families. |
Yes |
|
List of column names to be used as the HBase row key. This property is
required but can be set to an empty list object, or |
Yes |
|
Number of salt buckets for this table, between 1 to 256. |
No |
The following block shows an example JSON table description file:
{
"columns" : [ {
"name" : "varchar_column",
"type" : "varchar",
"family" : "column_family",
"comment" : "comment"
}, {
"name" : "int_column",
"type" : "integer",
"family" : "column_familia",
"comment" : "comment"
} ],
"keyColumnNames" : [ "int_column" ],
"saltBuckets" : 5,
"comment" : "table_comment"
}
Create a namespace#
In SEP, HBase
namespaces are
represented by schemas. To create a namespace in HBase, use a CREATE SCHEMA
SQL statement in SEP.
CREATE SCHEMA test_namespace;
Note
CREATE SCHEMA
statements are disabled by default, but can be enabled
with the hbase.allow-create
configuration property. See
CREATE, DROP, and INSERT support for more information.
Create a table#
In SEP, running a CREATE TABLE
statement creates a table in HBase using
an existing table’s schema as the new table’s namespace.
All columns within these created tables are part of the default cf
column
family. Column families cannot be modified in an SEP session, but can be
changed in the table’s description .json
file.
Note
CREATE TABLE
statements are disabled by default, but can be enabled
with the hbase.allow-create
configuration property. See
CREATE, DROP, and INSERT support for more information.
CREATE TABLE
statements support the same properties as the table
descriptions.
CREATE TABLE hbase.default.test (
id1 BIGINT,
id2 BIGINT,
value VARCHAR
)
WITH (
key_column_names = ARRAY['id1', 'id2'],
salt_buckets = 10
);
Type mapping#
The connector supports the following SQL types:
BOOLEAN
TINYINT
SMALLINT
BIGINT
REAL
DOUBLE
VARCHAR
VARCHAR(N)
CHAR(N)
VARBINARY
DATE
TIMESTAMP
TIMESTAMP(N)
All other SQL types are not supported.
SQL support#
The connector provides read and write access to data and metadata in HBase. In addition to the globally available and read operation statements, the connector supports the following features:
CREATE, DROP, and INSERT support#
The following features are supported by the connector but disabled by default:
These operations are non-transactional and can therefore result in data loss if interrupted. To enable these SQL statements, set the corresponding configuration in the catalog properties file:
//Enable CREATE TABLE, CREATE TABLE AS, and CREATE SCHEMA
hbase.allow-create=true
//Enable DROP TABLE and DROP SCHEMA
hbase.allow-drop=true
//Enable INSERT
hbase.allow-insert=true
INSERT#
INSERT
operations with the same values for key columns overwrite rows in
HBase.
ALTER TABLE#
The connector does not support ALTER TABLE RENAME
or ALTER TABLE SET AUTHORIZATION
.
Performance#
The connector includes a number of performance improvements, detailed in the following sections.
Parallelism#
By default, the connector attempts to create multiple splits off available table regions for reading data. Each split is mapped to a single region and processed in parallel. Therefore, it is best practice to split HBase tables over multiple regions for better performance.
Table salting#
Sequential write operations can perform poorly across region servers when the row keys increase monotonically. SEP supports salting tables as a way to mitigate this problem.
If a table is not created through SEP and doesn’t follow the one byte salt key prefix, the connector won’t be able to distinguish the salt prefix from the actual key. In this case, resultant data includes the salt prefix as part of the row key column.
Projection pushdown#
The connector can improve performance of table scans by performing pushdown of column projections to HBase. This reduces the amount of data transferred between SEP and the region servers.
If projection pushdown is enabled, rows without any of the projected columns are skipped. This is because HBase is a non-relational database with an arbitrary number of columns per row, so it can only return rows where at least one of the projected columns exist.
Projection pushdown is disabled by default, but can be enabled by setting the
hbase.projection-pushdown.enabled
property in the catalog properties file
to true
:
hbase.projection-pushdown.enabled=true
Security#
The connector includes a number of security-related features, detailed in the following sections.
User impersonation#
The connector supports user impersonation.
Enable user impersonation by setting the hbase.impersonation.enabled
property
in the catalog properties file to true
:
hbase.impersonation.enabled=true
Kerberos authentication#
The connector supports Kerberos authentication. Use the following properties in the catalog properties file to configure it.
hbase.authentication.type=KERBEROS
hbase.master.kerberos.principal=hbase/_HOST@example.com
hbase.regionserver.kerberos.principal=hbase/_HOST@example.com
hbase.client.kerberos.principal=example@example.com
hbase.client.kerberos.keytab=etc/kerberos/example.keytab
With this configuration the user example@example.com
is used to connect to
the database, and the related Kerberos service ticket is stored in the
example.keytab
file.