Important Nexla Help Center Update:
Nexla's Zendesk Help Center pages are being deprecated and will soon no longer be available.
Nexla Documentation is now the home for Nexla's User Guides, with improved formatting and categories that are easier to navigate, providing a better overall user experience.
Please update any bookmarks to the new Nexla Documentation site (docs.nexla.com/user-guides).
_______________________________________________
This article provides information about sending Nexsets (data) from any data source to a Databricks destination with Nexla.
Contents:
1. Begin Sending a Nexset to Databricks
2. Select the Databricks Destination
2.1 Send Data to an Existing Table
2.2 Create and Send Data to a New Table
3. Select Update and Tracking Options
4. Map the Nexset Attributes to Table Columns
4.1 Automatic Column Mapping
4.2 Manual Column Mapping
5. Complete and Activate the Data Flow
1. Begin Sending a Nexset to Databricks
- Click the icon on the Nexset to be sent to the Snowflake database destination. This will open the Send Nexset to Destination screen.
- Select from the destinations list, and click in the top right corner of the screen.
- Select the appropriate credential, and click in the top right corner of the screen.
To learn how to set up a new Databricks credential, see Section 2 in the Help Center article Connect to a Databricks Data Source.
2. Select the Databricks Destination
With Nexla, data contained in Nexsets can be sent to an existing Databricks table, or users can create and send the data to a new table in the Databricks workspace. To learn how to send data to an existing table, see Section 2.1. To learn how to create and send data to a new table, see Section 2.2.
2.1 Send Data to an Existing Table
- Find the Databricks table to which Nexla should send the data. Expand locations and tables as necessary by clicking the icon next to each.
- Select the table to which the data should be sent by hovering over it and clicking the button that appears to the right.
The button should now display , and the path of the selected location will be shown at the top of the list.
- Click in the top right corner of the screen.
- Proceed to Section 3.
2.2 Create and Send Data to a New Table
- Find the Databricks workspace in which a new table should be created, and expand the workspace by clicking the icon next to it.
- Click below the workspace in which a new table should be created.
To create and send the data to a new table, users must have permissions to create tables in the Databricks workspace. If these permissions are removed after the data flow is created, all table updates associated with the flow will be stopped, and the user will receive a corresponding notification in Nexla.
- Click in the top right corner of the screen.
- Enter a name for the new table in the Table Name field.
- Proceed to Section 3.
3. Select Update and Tracking Options
- Select whether the records contained in the Nexset should be inserted into the Databricks table or merged using the Update Mode pulldown menu.
The MERGE option is only available for use with Delta Lake tables.
- When INSERT is selected, the data in the Nexset will be inserted into the table as new rows.
- When MERGE is selected, for Nexset rows that match existing Databricks table rows, the Nexset data will be used to update the existing table rows, including both updated values and deletions. Data in Nexset rows that do not match existing Databricks table rows will be inserted into the table as new rows. This function is similar to an "upsert" function for other databases.
- When INSERT is selected, the data in the Nexset will be inserted into the table as new rows.
- Optional: When the Merge update mode is selected, by default, Nexla is configured to allow table columns to be updated with Nexset records containing null values. To omit null record values from the upsert and allow records to be partially upserted, uncheck the box next to .
- Optional: To enter a name for the tracker that will be used to trace the lineage of the Nexset in Nexla, check the box next to , and enter the desired tracker name in the text field that appears below.
4. Map the Nexset Attributes to Table Columns
Nexla supports both automatic and manual mapping of attributes to Databricks table columns.
The automatic column mapping option is available only for Insert-type data flows and is described in Section 4.1. To learn more about manual column mapping, see Section 4.2.
4.1 Automatic Column Mapping
- Nexla can automatically infer matching column data types in the Databricks workspace for record attributes in the Nexset, rather than requiring manual column designation. To enable this feature, check the box next to .
With automatic attribute mapping enabled, Nexla still preserves attribute nesting.
- When the automatic mapping option is selected, Nexla will display sample records from the Nexset that will be used for automatic mapping. The attribute names shown will be used as column names in the database.
To change the table structure and column names, use the Nexset Designer to edit the Nexset that will be sent to the database.
- Once the table structure of the automatic column mapping samples is correct, proceed to Section 5.
4.2 Manual Column Mapping
- The formatting table at the bottom of the Format screen is used to map Nexset attributes to columns in the new or existing Databricks table.
- When the Nexset will be sent to an existing Databricks table, under the Column heading, Nexla will supply a column name from the selected table for each attribute contained in the Nexset.
- When a new Databricks table will be created, under the Column heading, column names corresponding to the Nexset attribute names will be supplied.
- When the Nexset will be sent to an existing Databricks table, under the Column heading, Nexla will supply a column name from the selected table for each attribute contained in the Nexset.
- Supplied column names can be edited by clicking on the entry in a Column field and entering the desired text.
- Optional: To create a new column in the Databricks table, click at the bottom of the formatting table, enter a column name under the Column heading, and continue to the subsequent steps in this section.
- Optional: To delete a column row, hover over the column listing in the formatting table, and click the icon that appears to the right.
Deleting a column will exclude that column and the Nexset data mapped to the column from the data sent to the destination. It will not delete the column from an existing Databricks table.
- Optional: Use the pulldown menu under the Linked Draft Attribute heading to change the Nexset record attribute mapped to a Databricks table column.
- For each mapped attribute, select the corresponding data type from the pulldown menu under the Type heading.
For flows in which the data will be sent to an existing table, Nexla will provide a data type selection for each attribute based on the data already contained in the table. Users can either continue with these selections or use the pulldown menus to specify different data types.
- For Merge-type data flows, select a primary key by checking the box under Primary in the row that contains the primary key attribute.
For Insert-type data flows, skip this step.
- Proceed to Section 5.
5. Complete and Activate the Data Flow
- Once all of the above steps have been completed, click in the top right corner of the screen.
- The confirmation screen indicates that the selected or created Databricks table has been successfully created as a destination in this data flow.
- Optional: To edit the name of the newly created destination in this screen, click on it, and enter the desired text.
- Optional: To enter a description of the newly created destination, click on , and enter the desired text.
-
- To activate the flow of data into the Databricks destination now, click .
- To activate the flow of data into the Databricks destination later, click in the top right corner of the screen.
- When the flow is ready to be activated, find the flow in the My Data Flows screen, and click on the destination.
- Click on the menu that appears to activate the flow of data to the Databricks destination.
- When the flow is ready to be activated, find the flow in the My Data Flows screen, and click on the destination.
- To activate the flow of data into the Databricks destination now, click .
Comments
0 comments
Please sign in to leave a comment.