Important Nexla Help Center Update:
Nexla's Zendesk Help Center pages are being deprecated and will soon no longer be available.
Nexla Documentation is now the home for Nexla's User Guides, with improved formatting and categories that are easier to navigate, providing a better overall user experience.
Please update any bookmarks to the new Nexla Documentation site (docs.nexla.com/user-guides).
_______________________________________________
This article introduces databases as data sources and provides general information about working with these storage systems in Nexla.
Contents:
1. Databases/Data Warehouses and Nexla
2. Connecting to a Database/Data Warehouse
3. How Nexla Organizes Data
4. Ingestion of New and/or Modified Files
1. Databases/Data Warehouses and Nexla
Databases provide an efficient way to store, manage, and access large volumes of both structured and unstructured data. Data stored in databases is used in a wide variety of operational and transactional workflows, including retail order and inventory tracking, business decision-making, financial transaction tracking, and many others.
Data warehouses are also used to store and access large volumes of structured or unstructured data, but typically, this data is aggregated from multiple sources and includes both current and historical data. Data stored in data warehouses is used to support workflows such as in-depth data analytics, data mining, artificial intelligence, and machine learning.
Examples of databases and data warehouses include Google BigQuery, Snowflake, RedShift, MySQL, and Oracle Autonomous.
Nexla can connect to and ingest data from any database or data warehouse, allowing users to quickly and easily build data flows that ingest data from these sources, apply any needed transformations, and send the data to any destination. Data flows originating from databases and data warehouses can be constructed to suit any use case, and Nexla's comprehensive governance and troubleshooting tools allow users to monitor every aspect of the flow status, data lineage, and more.
2. Connecting to a Database/Data Warehouse
With Nexla's connectors, users can quickly and easily add any database or data warehouse as a data source to begin ingesting, transforming, and moving data in any format. This section provides general instructions and information about connecting to databases and data warehouses.
- After logging into Nexla, navigate to the Integrate section by selecting from the platform menu on the left side of the screen.
- Click at the top of the Integrate toolbar on the left to open the Select Source Type screen.
- To view all of Nexla's currently available database and data warehouse connectors, select from the Categories list on the left.
- Select the connector type corresponding to the database/data warehouse that will be added as a data source—for example, .
- Create a new credential that Nexla should use to connect to the database/data warehouse; or select an existing credential, and click .
The information required to create a new credential varies by the database/date warehouse type. More detailed instructions are provided for each connector in the articles in the Create a Data Source section of the Help Center.
- In the screen, select the data that should be ingested from the database/data warehouse. Nexla provides two modes for this function:
- Table Mode - Specify the data to be ingested by selecting the table from a list of all tables within the database/data warehouse.
- Query Mode - Perform a query of any level of complexity to select the data to be ingested by constructing the query in a free-form query editor.
- Under the Scheduling heading in the Advanced Settings panel on the right, use the pulldown menu to select the frequency at which Nexla should scan the selected table for new data.
For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency. - Optional: To set a specific time at which Nexla should fetch any new data from the source, check the box next to "Set Time", and use the pulldown menus to select the desired time.
- Optional: If the selected table contains additional data that should not be ingested in this data flow, use the Table Scan Mode pulldown menu in the Advanced Settings panel to configure the appropriate data-filtering option.
Data filtering options and instructions for configuring each available table scan mode can be found in Section 1.2 of Database Sources - Table Mode & Query Mode.
- Once all of the above steps are complete, click in the upper right corner of the screen to create the new data source in Nexla.
- Nexla will now begin scanning the selected database/data warehouse table for data and will organize any data that it finds into one or more Nexsets.
- Nexla will now begin scanning the selected database/data warehouse table for data and will organize any data that it finds into one or more Nexsets.
3. How Nexla Organizes Data
When Nexla ingests data from a source—whether a file-based storage system or any other type of service—the platform intelligently analyzes the structure of the data to organize it into one or more Nexsets.
If a location containing multiple files is selected when configuring a data source from a file-based storage system, Nexla will examine the differences between the ingested files. The platform will create Nexsets containing the ingested data based on the level of overlap between records and options selected during data source creation.
After the initial data ingestion cycle, Nexla will repeat the process of comparing the structure and composition of data newly ingested in subsequent cycles to any existing Nexsets. Similar data will be added to existing Nexsets, while significantly different data will be organized into a new Nexset.
Important Note: Nexla's comparison of ingested data to existing Nexsets ignores differences in file format. For example, when a CSV file containing the headers "ID" and "Name" and a JSON file with "ID" and "Name" object properties are ingested, the data contained in both files will be processed into the same Nexset.
4. Ingestion of New and/or Modified Files
Once a data source has been created in Nexla, whether from a database/data warehouse or any other type of service, the platform will scan the source at regular intervals according to the configured scheduling options. When Nexla detects new files during a scan, it will automatically ingest and process the data contained in the new files and mark the files as ingested.
Nexla also tracks the number of rows of data that have been ingested from each file. Therefore, when additional rows of data are added to a previously ingested file, the platform will automatically ingest and process the added data.
Important Note: Nexla reads and processes data from a source according to the configured schedule, but the platform will wait for a period of inactivity in the data flow. Therefore, users should avoid repeated pausing and reactivation of the data flow when the source contains new data that should be ingested.
Comments
0 comments
Please sign in to leave a comment.