In addition to these tables Discovery Hub has six other table types:
- Custom tables: An empty table than can be populated with custom fields.
- Date tables: Usually used for creating time dimensions on OLAP cubes.
- Hierarchy tables: Used to create special reporting structures based on the other tables in the data warehouse, especially for financial reporting.
- Junk dimension tables: A concept in dimensional modeling, junk dimension tables replace multiple fields in a table with a field referencing a row in another table containing the same combination of fields.
- Aggregate tables: Creates version of an ordinary table with aggregated data.
- External tables: Tables that are "sideloaded" into a data warehouse from another SQL Server database.
Changing Settings for a Table
Most settings for tables are consolidated in the Table Settings window.
To open the table settings window
- Right click a table and click Table settings.
Depending on the table type, not all settings are available. For instance, incremental load does not make sense for date and hierarchy tables, so incremental load settings are disabled for these table types.
Simple mode is a setting on tables on business units aimed at maximizing performance when you need to copy large amounts of data into a staging database to create an exact copy. See Simple Mode for more information.
Per default, a table inherits the simple mode setting from the data source which in turn inherits the setting from the business unit.
To enable simple mode
- Right click the table, click Table settings and click Enable under Simple Mode.
Guarding a table tells Discovery Hub to skip the table on execution or deployment. This is useful if, for instance, the table contains old data from a legacy system that is no longer running.
To guard a table
- Right click the table, click Table settings and then select Guard on deployment and/or Guard on execution under Guard.
Enabling Batch Data Cleansing to Improve Data Cleansing Performance
You can choose to split the INSERT statement up in batches during data cleansing, i.e. when copying data from the transformation view for table to the valid table. This saves log space on the SQL Server which gives you better performance on large tables with 100,000s or millions of rows.
To enable batch data cleansing, follow the steps below.
- Right click the table you want to use batch data cleansing on and click Table settings.
- Click the Performance tab and select Enable batch data cleansing.
- (Optional) Enter the number of records you would like each batch to contain in Batch size. The default is 100,000.
- Click OK.
Set Distribution for Tables on Azure Synapse Analytics
For tables on a data warehouse deployed on Azure Synapse Analytics, additional settings are available on the Performance tab in Table Settings. The Distribution option controls how data in the table is distributed among the 60 distributions and x amount of compute nodes on the server.
The following settings are available:
- Round-robin (default): Rows of data are distributed evenly across the server.
- Replicate: A copy of all data in the table is stored on all nodes.
- Hash: Rows of data are split among the distributions using a hash function based on the values in a 'distribution column' that you set. The default is the system field 'DW_id'.
To set distribution for the table
- Right click the table, click Table setting, click Performance and then click the setting you want to use under Distribution
To set a distribution column for the table
- Right click a field on the table and click Distribution Column
Note: Since tables with hash distribution enabled requires a distribution column, you cannot unselect a distribution column. However, you can set another column as your distribution column.
Please refer to Azure Synapse Analytics documentation for more information on distribution.
Custom tables are basic tables that you can add to the data warehouse or staging database. While they do not initially contain any fields except the standard system fields, you can add new fields to custom tables.
With custom tables, you can, for instance, build you data warehouse first and then map the data in from the data sources.
Adding a custom table to the Data Warehouse
To add a custom table to a data warehouse, follow the steps below.
- Expand Data Warehouses, and then expand the preferred data warehouse.
- Right click on Tables and click Add Table. The Add Data Warehouse Table window appears.
- In the Name box, type a name for the table, and then click OK.
Adding a custom table to the Staging Database
To add a custom table to a staging database, follow the steps below.
- Expand Business Units and expand the preferred business unit and staging database.
- Right click Tables and click Add Custom Table. The Add Data Warehouse Table window appears.
- In the Name box, type a name for the table and then click OK.
You will typically use date tables when you build OLAP cubes on top of the data warehouse. Most often, the cubes you create will contain a date dimension to make it possible to analyze data over time. For example, you may report data on a daily, weekly, or monthly basis. In Discovery Hub you use date tables, stored in the data warehouse, as the basis for your time dimensions.
In addition to day of month, day of quarter, week of year and other ordinary information about each date, date tables also contain indexes. An index is a column in the table that tells you something about the date’s relation to the current date. Date tables contain year, quarter, month and week indexes. All index values for today’s date are 0, while for instance, any day last year would have a year index of -1. This makes it trivial to compare e.g. the same month, day or quarter across years.
Date tables can also contain custom periods, special periods of time, for instance holidays or yearly sales campaigns, that enables you to easily track data across these reoccurring time periods.
To add a date table, follow the steps below.
- On a data warehouse, right click Tables and click Add Date Table. The Add Date Table window opens.
- In the Name box, type a name for the table.
- Select a Date range by entering a Start date and an End date. Instead of entering an end date, you can enter a number of days to add to the current date in Days ahead. This way, your date table will effectively never end.
- Under Date display, select the Format you want to use for dates. You have the following options:
- Select which Separator to use in the format you chose. You have the following options:
- - (dash)
- / (slash)
- . (dot)
- Under Week numbering, click the First day of the week. You have the fol-lowing options:
- Select the how to define the First week of the year. You have the following options:
- First 4-day week (following the ISO 8601 standard, common in Europe)
- Starts on Jan 1 (common in North American)
- First full week
- Under Fiscal year, click Staggered to use a staggered fiscal year and click the first month of the staggered fiscal year in the First month list.
- (Optional) Click Add under Custom periods if you want to add a custom period. In the custom period window that opens, you can type a Name for the custom period and Name, Start date and End date for the included periods. You can also import and export custom periods by clicking Import and Export respectively. Click OK when you are done.
- (Optional) Click Custom names if you want to change the names used for days, quarters and months. In the Date Table Custom Names window that opens, you can type the names you want to use. The default is derived from the regional settings on the deploying machine. Click OK when you are done.
- Click OK to add the date table.
A hierarchy table is used to select data from a table and create a new reporting structure which is different from the structure in the data source. You will typically use a hierarchy table for financial reporting where you want to consolidate data from a number of different accounts, such as ledger accounts.
A hierarchy table is used in conjunction with a parent-child dimension. First, you create the hierarchy table and specify the contents of the table. Then you create a parent-child dimension and add it to a cube. When you build the structure, be sure to choose names that are meaningful to the end-user.
Adding a Hierarchy Table
- From the Solution Explorer, open the relevant data warehouse.
- Right-click Tables and then click Add Hierarchy Table. The Hierarchy Table window opens:
- In the Name box, type a name for the table.
- In the Source table list, click the table containing the desired data.
- In the ID field, click the field that identifies the individual entries in the table; for example, the customer number. If you need more than one field to identify the entries, you have to create a concatenated field before you create the hierarchy table.
- In the Name field, enter the name that identifies the individual entries; for example, account name, and then click Load. The Hierarchy Mapping pane is now populated with the entries of the source table.
- You can now create the report structure. The structure you create corresponds to the structure of the report that is displayed to the end-user.
- Right-click in the Blank pane, and then select Add Root Heading. The root headings become root nodes in the final report.
- In Level Setup, type a name for the heading in the Name field.
- Right-click the root heading, and select Add Sub Heading to add a child node to the structure.
- In the Name box, type a name for the child node.
- In the Unary Operator list, you specify how you want the value of the child node to be aggregated to the sum of all the values in the subheading. The unary operator ensures that the values are aggregated properly in the final report. You have the following options:
Operator Definition None The value is ignored Add The value is added to the sum of the values Subtract The value is subtracted from the sum of the values Multiply The value is multiplied by the sum of values Divide The value is divided by the sum of the values
Repeat steps 8-12 for all root headings and subheadings you want to add.
- Click and hold an entry in Hierarchy Mapping , and drag it to the preferred subheading in the Blank pane. Alternatively, you can specify a range of entries by typing the relevant numbers in From and To.
- To exclude an entry from a given range, right-click the relevant subheading, click Add Exclude, and then specify a range by typing the relevant numbers in From and To. Alternatively, right-click the specific entry and select Change to Exclude.
- If a root heading or subheading represents the sum of other subheadings, such as Contribution Margin, you can use a formula to determine the content of the heading. Type a formula in the Roll-up formula. Formulas are written in MDX.
- Click OKwhen you have completed the structure. You can now create the parent-child dimension where the consolidation table will be used.
Note: When you create the parent-child dimension you will typically use Sort By Attribute. You therefore need to create a Sort order dimension level where the key column is Sort order. It is also necessary to enable Unary column and Roll-up column on the dimension. You can then set the parent-child dimension to Sort By Attribute.
An aggregated table is an aggregated version of another table in you project. Often, you will not need the transactional level in financial or sales reports, but only data grouped by business unit or sales team. This makes the aggregated tables feature very useful if you are doing reporting directly from you data warehouse as opposed to using, for instance, OLAP cubes.
Adding an Aggregated Table
To add an aggregated table, follow the steps below.
- Under Tables in a data warehouse, right click the table, you want to add an aggregated version of, click Advanced and click Add Aggregate Table. The Add Aggregate Data Table window opens.
- Under GroupBy, you can choose what columns on the table the aggregated table should use for grouping the aggregated data. Click the column you want to use in the empty list under Table: [table name]. Type a name for the field in Field name. If the field you have chosen contains date values, click the list under GroupBy Type to adjust the granularity of the grouping. You can choose second, minute, hour and all the way up to year. You can use the same date column multiple times with different GroupBy types. For other data types, the GroupBy type will always be Value.
- Under Aggregate, you can choose what columns from the table you want to have aggregated. Click the column you want to use in the empty list under Table: [table name]. Type a name for the field in Field name. Click the list in the Aggregation Type column and click the method you want to use for calculating the aggregation. You have the following options:
- Min: The lowest value of the field in question.
- Max: The highest value of the field in question.
- Count: The number of rows.
- Count_Big: Same as count, but is able to count higher than 2^31, be-cause it uses the bigint data type instead of the int data type.
- DistinctCount: The number of unique values in the field.
- Sum: The sum of all row values.
- Average: The average of all row values.
- (Optional) Click on Yes in the list in Override data type column if you want to be able to change the data type for the field. If you right click on the field on the aggregated table, when you have added it, and click Edit aggregate field, you will have options to change the data type.
- Click OK to add the aggregated table.
Junk Dimension Tables
A junk dimension is a concept in dimensional modeling. It combines multiple low-cardinality attributes and indicators into a single dimension table as opposed to adding separate dimension tables. This reduces the size of the fact table and makes the dimensional model easier to work with.
The junk dimension table contains a row for all distinct combinations of the junk dimension attributes along with a key that identifies the specific combination. The junk dimension attribute fields can be removed from the fact table and replaced with the single field reference to the junk dimension table.
Multiple tables can utilize the same junk dimension table.
Adding a Junk Dimension Table
Junk dimensions can be added to tables in both data warehouses and staging databases. To add Junk Dimension table for a table, follow the steps below.
- Right click on a table, click Advanced and click Add Junk Dimension Table.
- In the window that appears, select the fields you want to include in you junk dimension table and click OK. A new window appears that allow you to customize you junk dimension table.
- Enter a name for the table in the Name box or leave the default ("Dim[table name]Info").
- (Optional) Click on a table in the Available tables list and click add to add it to the junk dimension table. A message will appear to ask you if you want to map fields automatically. The auto mapping algorithm maps a field on the table you are adding to a field on the junk dimension table if one of the following conditions is true: It has the same name as the junk dimension table field or it has the same name as another field that is mapped to the junk dimension table field.
- Map the fields in the junk dimension table with the fields in the included tables. Each row in the table represents one field in the junk dimension table while each column represents a table.
- To add a new field to the junk dimension table, click the empty field in the bottom row of the second column and type a name.
- To remove a field from the junk dimension table, right click the first column of the corresponding row and click Remove.
- To remove a table from the junk dimension table, right click the header row of the corresponding column - the table name - and click Remove.
- To change the order the tables are loaded in, click on the header row of the corresponding column - the table name -and drag it to the desired location.
- (Optional) In the Hashing algorithmlist, click on the hashing algorithm you want to use for the dimension table key. See Default hashing algorithm under Creating a Project for information about the different algorithms. Junk dimensions have a special hashing algorithm available, "Legacy integer", for compatibility with older versions of Analysis Services. You should avoid this algorithm if possible since it is not very safe. It is only 8 bytes which means that the risk of two different data sets giving you the same hash is much higher than with any of the other algorithms.
When you have added a junk dimension table, it appears with a yellow table icon. You can add fields, lookup fields and transformations to the junk dimension table as well as custom data and data inserts.
When you have added a junk dimension table to a table in a staging base, the next step is to add the table and corresponding junk dimension table to the data warehouse. You do not need to add the fields on the table that are part of the junk dimension. This saves you storage in the database.
When the junk dimension table is executed, it will insert non-existing junk dimension combinations from the included table. The junk dimension table has no truncation of the raw instance of the table.
External tables is a way to incorporate tables from an existing data warehouse into a Discovery Hub project. This is useful if you, for instance, have a legacy data warehouse humming along that you would like to use data from without remodeling it in Discovery Hub.
An external table will initially not be deployed or executed, but will be available for data movement to the data warehouse and can be used just as any other table on the data warehouse. By default Discovery Hub will create views and read data from the external connection, but you can also chose to move the data into your data warehouse. You can also add a custom SSIS package to the table, which can then be executed.
Adding an External SQL Connection
To add an external table, you first need to add an external SQL connection. To add an external SQL connection, follow the steps below.
- Right click your data warehouse or a business unit, navigate to Advanced and click Add External SQL Connection. The Add External SQL Connection window appears.
- Type a Name for the connection.
- In the connection type list, click on the type of connection you want to create. You have the following options:
- Cross-database (local): Connect to another database on the same server as the data warehouse.
- Cross-database (linked): Connect to another database on a linked server.
- Data transfer: Connect to another database with an option to transfer data as well.
- Type the name of the server in the Server box. This option is not available if you chose Cross-database (local) as your Connection type.
- Type the name of the database in the Database box.
- Select Force codepage conversion to convert all fields to the collation of the data warehouse.
- Select Force Unicode conversion to declare all alphanumeric fields as nvarchar.
- Select Allow dirty reads to allow reading from the source without locking the table.
- Select Transfer data to copy the data from the external SQL Server to your local data warehouse, much like a regular SQL Server data source. This option is only available if you chose Data transfer as your Connection type.
- (Optional) Enter additional connection properties in the Additional connection properties box.
- (Optional) Click Data Extraction Settings if you want to limit the objects brought into Discovery Hub before the data selection stage. For more information, see Filtering What Objects to Extract.
- Click OK to add the connection.
Adding an External Table
To add an external table, follow the steps below.
- Navigate to External SQL Connections under your data warehouse or business unit, right click the connection you just created and click Read Objects from Data Source.
- When Discovery Hub has finished reading objects from the data source, the Source Explorer pane in the right hand side of the window is populated with the objects from the source. Select the tables, views and fields you want to use in you data warehouse.
Working with External Tables
The external tables in your project are displayed alongside the standard tables and you can use them in the same way. External tables can be used in dimensions and cubes, for reporting, in Qlik models etc. You can recognize an external table on the black table icon.
Some of the transformations and data cleansing you can do with standard tables can be done with external tables as well. You can add custom fields, but not lookup fields. For instance, you can add a custom field to the external table and apply a transformation to the field to concatenate two other fields on the table.
You can also add custom data to an external table.
Deploying an External Table
To deploy an external table, right click the table and click Deploy. A View will be created that selects from the external table.
Executing an External Table with an SSIS package
Since an external table is set up outside Discovery Hub, Discovery Hub expects it to be executed separately from your project. This means that you initially will not find any execute command on an external table. However, if you have a SSIS Package that is used to populate the table, you can add this package to the table and get the ability to execute the table.
- Right click an external table, navigate to advanced and click Customize code. The Customize Code window appears.
- Click the Add button to the right of SSIS Package. The Custom Editor window appears.
- In the Editor Name list, click you editor of choice and click OK. The Custom SSIS window appears.
- Make sure Existing Package is selected and click OK. The Pick SSIS Package window appears.
- Type the server name in the Server box. Optionally, you can select Use SQL Server Authentication and type your credentials in the User Name and Password boxes as appropriate. In the Location list, click File system or SQL Server and then click … next to the Package Name box to browse for the SSIS package. When you have found the package and clicked Open in the Open window, click OK and the editor of your choice opens.
- Make any changes you want to make in the editor, save the package and close the editor.
- While you edit the SSIS package, Discovery Hub displays the Custom Code Editor dialog. When you return to Discovery Hub, click Import to import the changes you made to the SSIS package.
- In the Customize Code window you'll notice that the Add command next to SSIS Package has changed to Edit and that you can now click Parameters and Delete as well. Click Close.
- Right click the table and choose Execute to run the SSIS package. You can also execute the table by including it in an execution package, executing the entire project etc.