Solved

Solutions for different schemas in a data lake?


Badge

I am new to TX and data lakes.  I have a common problem that others have commented on and would like to know if:

1) The functionality now exists, or if it does not

2) How are others getting around the issue?

The issue is that I need to be able to segregate data sources in the data lake.  I am pulling in data from multiple SQL sources where the source schema and table names are the same, and I need to differentiate those tables in the data lake.  By schema would be best, but if that is not possible, I need some other ideas.  Thanks!

icon

Best answer by brandon.spencer 28 June 2022, 17:07

View original

6 replies

Userlevel 5
Badge +7

Hi Brandon,

the ODX Server will store data pulled from a source in a folder inside your storage container with the source's name. I.e. for an AdventureWorks2014 SQL database loaded through a SQL source named AW2014 you would get: data lake container > AW2014 > production_product for the folder containing data from the AdventureWorks2014.production.product table

Badge

Rory, thanks for the comment.  That directory structure within the data lake appears to be sufficient for differentiating sources.  It appears to me that the ODX copies over whatever the source schema is to whatever the storage option is.  At any point along the way, whether it be from the ODX to storage, or from storage to a staging or MDW table, is there a way to create/map a table to a different schema?

Userlevel 5
Badge +7

The ODX is meant to mirror your source system closely, custom schema assignments break from that structure.

Once you pull data into TimeXtender Datawarehouse layers, you can fully control how schemas are assigned. This can be based on the type of table (raw/valid/transformation/etc.), per table, etc.

Badge

Okay, that's good news.  Do I create the schemas via SQL server and map them in TX, or is there a way to create the schema from within TX?  I presume the former, but since I am new to this meta-data repository idea, I don't want to do things out of order.

Userlevel 5
Badge +7

The latter is prefereable: you want to orchestrate as much as possible from one place, and that is what TimeXtender is there for. I am pretty sure this is covered in the online training at https://learn.timextender.com/ - that should give you some context as well

Badge

Thanks, Rory.  In case somebody comes across this post in the future who is new to TX like me, I found this TX link:

Database Schemas – TimeXtender Support

After reading the information in the link above, it jogged my memory that the training did address this.  And it can be found under TimeXtender - Optimization: Schemas, Views, and Simple Mode

Reply