Important Nexla Help Center Update:
Nexla's Zendesk Help Center pages are being deprecated and will soon no longer be available.
Nexla Documentation is now the home for Nexla's User Guides, with improved formatting and categories that are easier to navigate, providing a better overall user experience.
Please update any bookmarks to the new Nexla Documentation site (docs.nexla.com/user-guides).
_______________________________________________
In today’s session, you’re getting access to a workshop environment to build your very own data flow from source to destination. Along the way, I’ll share information on features. The workshop is a great chance for you to get hands-on and get a feel of ease of use with Nexla.
For this exercise, you will be building a data flow that takes a file from an FTP server, hashes the email address, and write it out to MySQL for your data consumption team.
1. Login to Nexla Workshop
Host: https://www.nexla.com/
Click on the top right corner to login.
Talk to us at support@nexla.com to get credentials.
The homepage shows an overview of data flows, data ingested, data written out, errors, notifications, and alerts. The left-hand side is the navigation bar and will be your way to move around the product.
You can always come back to this page by clicking on the icon.
About the Data
For today’s session, I have mocked up some fake data for the demo. I used a service called Mockaroo to auto-generate data we will use for our exercises.
The data will contain typical user information such as id, first_name, last_name, email, gender, state, and phone. There are 100 records in this file.
2. Create a New Data Flow
Hover over Data Flow and select [Create New Data Flow].
To start, you’ll create a new source.
3. Browse or Search and Select FTP
Browse through the sources for FTP
OR search for FTP.
4. Select FTP Credentials
Typically you would enter server information and credentials here.
For the demo, an FTP credential has been provided. Go ahead and select FTPS user.
This authenticates the connection. Credentials provided have read and write privileges in a specific folder within the FTP server.
5. Browse folders and files in the FTP source
Expand the parent directory to view folders and files within the directory.
Expand people_data, then PII. Notice [SELECT] appears when you hover over a folder and [PREVIEW] when you hover over a file.
Hover over People_By_State for [PREVIEW]. Click [PREVIEW].
Select the folder PII. Additional configuration is available on the right pane of the window. You can set the schedule and additional options for configuring the ingestion. Check Custom Path Settings and in Paths to be scanned, enter /people_data/PII/People_By_State.csv since we just want to focus on People_By_State.csv.
6. Start of Your Data Flow
The discovered dataset will take a minute or two to appear. During this time, Nexla is scanning the data to build the dataset. The data is scanned at each ingestion to detect if there are schema or significant record changes. These changes are recorded and parties are notified. Without significant changes, the same dataset is still relevant.
7. Expand the Discovered Data
A dataset is a working data model and contains metadata with samples. Click on the magnifying glass for details of the dataset. Besides field names and samples, Nexla also introspects the data type.
Besides field names and samples, Nexla also introspects the data type.
You can click on an Attribute and add a description. Here, I added personal to the description of email.
You can see stats of an attribute but clicking on the graph icon.
8. Transform a Dataset
Often it’s necessary to make a few or a lot of changes to the original dataset from the source. We’re going to make a few revisions. Click on [CREATE DATASET] to start the revision.
The next screen is the dataset transformation page. You’ll be spending a majority of time here actually and there is a lot going on. I’ll go through a workflow.
The default view shows samples as columns in a table. You can change to row view or nested JSON objects. The same can be done to draft attributes.
On the right, you can seed draft with a schema template which makes the output look a certain way.
This shows Facebook: Add Users to Custom Audience.
To remove this, click on the Schema Template button and then X in the field that says Facebook.
The left pane is the incoming dataset. In this case, it’s the file from the FTP server. The middle section is the transformation logic you’ll apply for a select field. The right pane displays what you’ll expect to see when you save this.
This screen allows you to enrich, transform and manipulate on the data. It’s also not necessary to have equal number of attributes or fields in the outgoing dataset.
NOTE - you MUST hit NEXT and select one of the boxes in order to save your progress. You can always return and modify this later.
We will hash the email address.
- Check email in Available Source Attributes on the left.
- Rename attribute as Hashed_email.
- Add description as Hash MD5.
- In Add Transform, type in Hash MD5.
- Click Add to Draft.
You will see Hashed_email in the right pane.
Click on the box that is next to search to select all Attributes.
In the middle of the window, click Add Selected to Draft. This makes a copy.
You can include metadata by switching Equal to from Attribute to Metadata.
Alright! I think we’ve spent quite some time on this, let’s move on. Click [Next] to save our progress.
9. Select Destination
Destination indicates what you want to do with the output dataset that you just created. You have 2 options for setting the destination. Selecting [Dataset Only] creates another dataset that can later be transformed or shared with others without giving access to the upstream objects. Selecting [Dataset Writing to Destination] writes the data to a physical location such as a Snowflake database, S3 file system or Kafka stream. Your output system does not need to be the same as your source system, nor does it need to correlate to a batch or streaming source.
Click on [Dataset Writing to Destination]
Browse or Search and select [MySQL]
We have given you credentials for MySQL.
Click on Create new table (Click Configure Table Columns to define table rules ).
Add a Table Name. Add table name starting with your initials [XX_users]. Hopefully, initials are unique, so shout out if you have the same initials as someone else in the demo. Change Update Mode to Insert.
- Optional. Check [Set Tracker Name]
- Tracker is a unique ID generated for each ingestion at the record level. It’s similar to a batch ID for auditing and traceability purposes.
- The Tracker ID contains information from the time the record was ingested from the source through Nexla to the destination and the transformation hops via the dataset.
- Visually confirm columns names and data types match your expectation
Click Create. While it’s still fresh in our mind, let’s update the titles and add a description to explain what this data flow is doing for others who look at it. Remember, someone else is going to support and maintain this later. So work you put in now to document will greatly reduce the work needed later.
Click Activate this Flow and then Done.
10. Data Flow Completed
WHA LA!! You’ve just created your first data flow in less than 1 hour. Started with a new source ingesting a user file, cleaned and enriched the data for analytical purposes.
The best part, now you can focus your efforts on creating insights and strategizing for business growth. Nexla will handle the data operations moving forward.
This will take a couple of minutes to complete, so let’s zoom out and review your data flow from the Data Flow page.
Comments
0 comments
Please sign in to leave a comment.