Important Nexla Help Center Update:
Nexla's Zendesk Help Center pages are being deprecated and will soon no longer be available.
Nexla Documentation is now the home for Nexla's User Guides, with improved formatting and categories that are easier to navigate, providing a better overall user experience.
Please update any bookmarks to the new Nexla Documentation site (docs.nexla.com/user-guides).
_______________________________________________
This article describes how to add a new Databricks data source in Nexla.
For the version of this article pertaining to the previous Nexla UI, click here.
Contents:
1. Create a New Data Source
2. Input Your Credential
3. Configure the Databricks Source
3.1 Table Mode
3.2 Query Mode
4. Scheduling and Advanced Options
4.1 Data Ingestion Scheduling
4.2 Optional Advanced Settings
5. Finish Creating the Databricks Data Source
1. Begin Adding a Databricks Data Source
- Log into Nexla with your provided credentials.
If you need credentials, contact support@nexla.com.
- Navigate to the Integrate section by selecting from the platform menu on the left side of the screen.
- Click at the top of the Integrate toolbar on the left.
- Select from the data source list; then, click in the top right corner of the screen to begin adding the Databricks data source.
2. Input Your Credential
- Select to open the Add a New Credential window and begin adding a new Databricks credential.
To use a credential that has already been added, select that credential, click in the top right corner of the screen, and skip to Section 3.
- Enter a name for the credential in the Credential Name field.
- Optional: Enter a description of the credential in the Credential Description field.
- Select how the Databricks authentication information will be entered from the URL Format pulldown menu.
- JDBC URL - Select this option to enter the authentication information as a JDBC URL.
- HTTP Path Parts - Select this option to enter the authentication information as parts that will be combined by Nexla to create the connection string.
- When JDBC URL is selected:
- Enter the JDBC URL of the Databricks location in the JDBC URL field.
The JDBC URL should be in the form of "jdbc:spark:/...".
- Enter the JDBC URL of the Databricks location in the JDBC URL field.
- When HTTP Path Parts is selected:
- Enter the hostname of the Databricks database in the Host field.
The hostname is typically an IP address or text in the format "company.domain.com".
Do not include the connection protocol. - Enter the cluster port number to which the Databricks source connects in the Port field.
- Enter the HTTP path of the Databricks SQL endpoint in the HTTP Path field.
The HTTP path is typically in the form "sql/protocolv1/o/<id>/0916-102516-naves603".
The HTTP path can be found under the JDBC settings in the Databricks console. - Enter the username associated with the Databricks account in the Username field.
- Enter the password associated with the Databricks account in the Password field.
- Enter the hostname of the Databricks database in the Host field.
- Optional: Enter the name of the Databricks database to which Nexla should connect in the Database Name field.
In Databricks, the terms "database" and "schema" are used interchangeably. For more information about databases/schema and other data objects in Databricks, see this Databricks article.
- Optional: Enter the name of the Databricks schema to which Nexla should connect in the Schema Name field.
- Select the type of cloud environment used by the Databricks instance.
Typically, the Databricks cloud environment is used, but Nexla also supports connecting to Databricks instances that run in other cloud environments.
- Optional: Click at the bottom of the Add New Credential window to access the following additional credential settings:
- If the Databricks database from which data should be read is not publicly accessible, check the box next to . This will append additional related fields to be populated in the Add New Credential window.
Selecting this option allows Nexla to connect to a bastion host via SSH, and the database connection will then be provided through the SSH host.
- Enter the SSH tunnel hostname or IP address of the bastion host running the SSH tunnel server that has access to the database in the SSH Tunnel Host field.
- Enter the number of the tunnel bastion host port to which Nexla can connect in the SSH Tunnel Port field.
- Create an SSH username for Nexla in the bastion host, and enter that username in the Username for Tunnel field.
Typically, this username is set to "nexla".
- Enter the SSH tunnel hostname or IP address of the bastion host running the SSH tunnel server that has access to the database in the SSH Tunnel Host field.
- If the Databricks database from which data should be read is not publicly accessible, check the box next to . This will append additional related fields to be populated in the Add New Credential window.
- Click at the bottom of the Add New Credential window to save the credential, and proceed to Section 3.
3. Configure the Databricks Source
In Nexla, the Deltabricks database source can be configured using either Table Mode or Query Mode.
Table Mode allows users to specify the database source through a simple selection method. This mode is equivalent to running a simple, optimized SELECT operation on any database table, while providing additional customization options to filter rows. To use this mode for configuration, see Section 3.1.
Query Mode allows users to perform a complex query to specify the database source. This mode provides a free-form query editor that can be used to perform any complex query written using the syntax and convention supported by the underlying database and/or warehouse. To use this mode for configuration, see Section 3.2.
3.1 Table Mode
- To configure the Databricks source using Table Mode, ensure that the tab is selected.
- Find the database location from which Nexla should read data. Expand files as necessary by clicking the icon next to each.
- Select the location from which data should be read by hovering over it and clicking the button that appears to the right.
The button should now display , and the path of the selected location will be shown at the top of the list.
- Optional: Click the button to the right of the mode-selection tabs to generate preview samples of data from the selected source at the bottom of the screen.
- Proceed to Section 4 to configure the data ingestion scanning schedule and any additional advanced options for the selected data source.
3.2 Query Mode
- To configure the Databricks source using Query Mode, select the tab.
- Enter the query specifying the database location from which Nexla should read data in the Custom Query to Fetch Data field, adhering to the Databricks SQL syntax and convention.
In this mode, Nexla supports any query that can be written following the Databricks syntax and convention, regardless of complexity.
For more information about Databricks SQL syntax, see this Databricks SQL Query reference page. - Optional: Click the button to the right of the mode-selection tabs to generate preview samples of the data selected according to the entered query at the bottom of the screen.
- Proceed to Section 4 to configure the data ingestion scanning schedule and any additional advanced options for the selected data source.
4. Scheduling and Advanced Options
4.1 Data Ingestion Scheduling
Nexla can be configured to scan the data source for data at a variety of frequencies, with options ranging from a one-time scan to scanning every 15 minutes. Optionally, users can also specify the time at which Nexla should scan the data source.
- In the Advanced Settings menu on the right, use the Scheduling pulldown menu to specify how often Nexla should fetch data from the source.
The default setting configures Nexla to fetch data from the source once every day.
- For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency.
- For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency.
- Optional: To set a specific time at which Nexla should fetch any new data from the source, check the box, and use the pulldown menus to select the desired time.
4.2 Optional Advanced Settings
- When the data source location is selected using Table Mode:
- Optional: Use the Table Scan Mode pulldown menu under the Data Selection heading to configure how Nexla should scan the table selected in Section 3.1 during each ingestion cycle.
This option is useful when working with a source containing historical data that should not be scanned.
By default, Nexla is configured to scan the entire selected table during each ingestion cycle, which is equivalent to running a SELECT clause on the table.
- Read the whole table – This option configures Nexla to scan the entire table, which is equivalent to running a SELECT clause on the table.
- Start reading from a specific ID – This option configures Nexla to begin scanning the table at a specific ID, which is stored in a numeric column.
- Start reading from a specific ID and timestamp – This option configures Nexla to begin scanning the table at a specific ID and timestamp.
- Start reading from a specific timestamp – This option configures Nexla to begin scanning the table at a specific timestamp, which is stored in a datetime column.
- Optional: Use the Table Scan Mode pulldown menu under the Data Selection heading to configure how Nexla should scan the table selected in Section 3.1 during each ingestion cycle.
- When the data source location is selected using Query Mode:
- Optional: If the query entered in Section 3.2 includes statements that should also be committed to the database after ingestion, select "True" from the Perform Database Commit After Read pulldown menu under Post Read Settings.
Typically, a database commit does not need to be performed, and this setting can be left as "False".
- Optional: If the query entered in Section 3.2 includes statements that should also be committed to the database after ingestion, select "True" from the Perform Database Commit After Read pulldown menu under Post Read Settings.
5. Finish Creating the Databricks Data Source
- Once all of the above steps have been completed, click in the upper right corner of the screen to create the new Databricks data source in Nexla.
- The confirmation page indicates that the Databricks database has been successfully created as a data source.
- Optional: Edit the name of the newly added data source by clicking on the name field and entering the desired text.
- Optional: Add a description of the data source by clicking on the field below the data source name and entering the desired text.
- To return to My Data Sources, click in the upper right corner of the screen.
- To view the newly created data source, click .
- To view datasets detected from the newly added source, click .
- To return to My Data Sources, click in the upper right corner of the screen.
Comments
0 comments
Please sign in to leave a comment.