Important Nexla Help Center Update:
Nexla's Zendesk Help Center pages are being deprecated and will soon no longer be available.
Nexla Documentation is now the home for Nexla's User Guides, with improved formatting and categories that are easier to navigate, providing a better overall user experience.
Please update any bookmarks to the new Nexla Documentation site (docs.nexla.com/user-guides).
_______________________________________________
This article introduces file-based storage systems as data sources and provides general information about working with these storage systems in Nexla.
Contents:
1. File-Based Storage Systems and Nexla
2. Connecting to File-Based Storage Systems
3. How Nexla Organizes Data
4. Ingestion of New and/or Modified Files
4.1 Re-ingestion of Files
1. File-Based Storage Systems and Nexla
File-based data storage systems are one of the most efficient ways to store, organize, and move large volumes of data. In these systems, data is stored in a hierarchical structure consisting of files located inside one or more folders.
Examples of file-based data storage systems include cloud services—such as Amazon S3, Azure Blob Storage, Box, Google Cloud Storage, and Google Drive—as well as FTP, SFTP, and FTPS servers and local hard-drive storage systems.
Nexla makes ingesting data from file-based storage systems a simple and quick process. Data ingested from these systems can be transformed and/or sent to any destination in only a few steps. Data flows originating from file-based storage systems can be constructed to suit any use case, and Nexla's comprehensive governance and troubleshooting tools allow users to monitor every aspect of the flow status, data lineage, and more.
2. Connecting to File-Based Storage Systems
With Nexla's connectors, users can quickly and easily add any file-based storage system as a data source to begin ingesting, transforming, and moving data in any format. This section provides general instructions and information about connecting to file-based storage systems.
- After logging into Nexla, navigate to the Integrate section by selecting
from the platform menu on the left side of the screen.
- Click
at the top of the Integrate toolbar on the left to open the Select Source Type screen.
- To view all of Nexla's currently available file-based storage system connectors, select
from the Categories list on the left.
- Select the connector type corresponding to the file-based storage system that will be added as a data source—for example,
.
- Create a new credential that Nexla should use to connect to the file-based storage system; or select an existing credential, and click
.
The information required to create a new credential varies by the file-based storage system type. More detailed instructions are provided for each connector in the articles in the Create a Data Source section of the Help Center.
- In the
screen, all files and folders in the file-based storage system that are accessible to the selected credential are shown. Locate the file or folder from which Nexla should read data.
- To expand a folder, click the
icon next to it.
- Nexla can read from any location within the directory, from individual files to all files within a folder or subfolder.
- Click
to the right of an individual file to preview its contents.
- To expand a folder, click the
- Click
to the right of the file or folder from which Nexla should read data.
Once a location is selected, the button should now display
.
- Under the Scheduling heading in the Advanced Settings panel on the right, use the pulldown menu to select the frequency at which Nexla should scan the selected location for new files.
- For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency.
- For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency.
- Optional: To set a specific time at which Nexla should fetch any new data from the source, check the box next to "Set Time", and use the pulldown menus to select the desired time.
- Optional: To configure additional advanced settings, including data formatting, data selection, schema, and grouping options, see the article Advanced Settings for File-Based Sources.
- Once all of the above steps are complete, click
in the upper right corner of the screen to create the new data source in Nexla.
- Nexla will now begin scanning the selected location in the file-based storage system for data and will organize any data that it finds into one or more Nexsets.
- Nexla will now begin scanning the selected location in the file-based storage system for data and will organize any data that it finds into one or more Nexsets.
3. How Nexla Organizes Data
When Nexla ingests data from a source—whether a file-based storage system or any other type of service—the platform intelligently analyzes the structure of the data to organize it into one or more Nexsets.
If a location containing multiple files is selected when configuring a data source from a file-based storage system, Nexla will examine the differences between the ingested files. The platform will create Nexsets containing the ingested data based on the level of overlap between records and options selected during data source creation.
After the initial data ingestion cycle, Nexla will repeat the process of comparing the structure and composition of data newly ingested in subsequent cycles to any existing Nexsets. Similar data will be added to existing Nexsets, while significantly different data will be organized into a new Nexset.
Important Note: Nexla's comparison of ingested data to existing Nexsets ignores differences in file format. For example, when a CSV file containing the headers "ID" and "Name" and a JSON file with "ID" and "Name" object properties are ingested, the data contained in both files will be processed into the same Nexset.
4. Ingestion of New and/or Modified Files
Once a data source has been created in Nexla, whether from a file-based storage system or any other type of service, the platform will scan the source at regular intervals according to the configured scheduling options. When Nexla detects new files during a scan, it will automatically ingest and process the data contained in the new files and mark the files as ingested.
Nexla also tracks the number of rows of data that have been ingested from each file. Therefore, when additional rows of data are added to a previously ingested file, the platform will automatically ingest and process the added data.
Important Note: Nexla reads and processes data from a source according to the configured schedule, but the platform will wait for a period of inactivity in the data flow. Therefore, users should avoid repeated pausing and reactivation of the data flow when the source contains new data that should be ingested.
4.1 Re-ingestion of Files
In some cases, a previously ingested file may need to be modified in a way that affects record values without adding new rows of data. When this occurs, the file should be marked for re-ingestion in the next scan cycle.
To re-ingest a file:
- Navigate to the Integrate screen by selecting
from the platform menu on the left side of the screen.
- In the All Data Flows list, locate the flow origin corresponding to the file that should be re-ingested, and click on it to expand the flow view.
- Click the
icon on the data source to open the Data Source information screen.
- Select the
tab to view a list of files previously ingested from this source.
- Click the
icon to the right of the file that should be re-ingested, and click
in the pop-up that appears.
The file will now be re-ingested during the next ingestion cycle.
Comments
0 comments
Please sign in to leave a comment.