Solved

Multiple CSV files with History (19.11.7.64)

  • 16 November 2023
  • 6 replies
  • 89 views

Hi Everyone,

---------Background

My organization is trying to build a Supervisory Org tracking database. Our previous HRM system did this natively, and enough departments are tired of waiting for the new system quirks to be ironed out so I’m tasked with a work around. I can’t connect TX directly to the new system, but HR can download a csv that includes employee ID, Employee Name, Supervisor ID, Supervisor Name, Job Title, Work Location and various less critical fields.

My plan then was to have an HR rep download the csv file once a week to a specific network folder, point TX at it with the Multiple Text File data source. I’d then use the DSA to perform some basic transformations and enable history on the DSA table. In theory this gives me a weekly update to an employee, tracking changes to key things like job title, supervisor, etc. We have a little over 1200 employees in the report, and 60-70k records a year shouldn’t be a problem. 

The initial ingest works well. TX reads the file, moves it to a back up location. The transformations and history in the DSA work, and the MDW gets a copy of the IsCurrent and IsTombstone fields and a new record every week.

--------The Problem

I want to schedule this execution package to run at least once a day, because we have the human point of failure and regardless of what time we decided the report will be updated, there will be delays/mistakes/vacations to deal with.

If the job runs and there is no file, the ODX clears the table. The DSA then marks everything as deleted and current, and every employee in the MDW gets a line of nulls marked as current. I haven’t found a way to prevent the execution from happening if a new file is not present.

If I instead keep the historical files in the same folder, the ODX table grows by 1200 duplicates each week and the history functions are lost.

 

I’m open to any suggestions that would meet my goal, though I’d like to keep them inside TimeXtender.

icon

Best answer by Christian Hauggaard 20 November 2023, 16:20

View original

6 replies

Userlevel 6
Badge +5

Hi @AndrewS 

Just to confirm are you using Business Units or ODX?

Business Units:

ODX:

Do you have primary keys enabled on your data source? Does the file have a modifieddate column in order to enable incremental load on the data source?

Can you please share some screenshots of your data source settings?

Hi @Christian Hauggaard,

Sorry I see why that is confusing. We are using a Business Unit called ODX. We do not use ODX server.

For primary keys:

In the ODX for the initial ingest, no. The fields are a bit of a mess and are in this form:

Worker, Supervisory Organization, Business Title, Location…

First Last (Employee ID), Division ( SupervisorFirst SupervisorLast (Supervisor Employee ID)), Title, Location

In the DSA I split the columns into Name, Employee ID, Supervisor Name, Supervisor Employee ID, Title and Location. Here I make the Employee ID the primary key, as well as the natural key for the history settings. My type 2 fields are Worker, Supervisory Organization, Title, and Location.

In the MDW I also bring over the Is Current, Is Tombstone, FromDate, and ToDate system fields from the history settings. I keep the Employee ID the primary key but include the From and To Dates as well to build a history.

The file itself does not include a datetime field, though the default name is “Directory by Organization <date>.csv”. I didn’t see a way to capture this, as the DW_SourceCode field inherits the name of the source, and not the name of the file.

Source Settings:

 

The process works as expected if I wait for a file to be present and then kick off an execution package manually. I’m hoping there is a way to have the ODX table not clear itself if the job runs and no file is present.

Userlevel 6
Badge +5

@AndrewS an option could be to change the Post Processing setting to None, and use incremental load to update the records rather than creating duplicates, however this would require a modified date field to be added to the file. Another option could be to set the Post Processing setting to None, and ask the HR rep to delete the old file when uploading the latest one (or manually move it into a backup folder) - this should prevent missing files in the folder.

Another solution could be to create a separate execution package for the BU table, and only execute this package once the file has been uploaded to the folder. You can manually execute execution packages by using desktop shortcuts or through the command line

https://legacysupport.timextender.com/hc/en-us/articles/210439143-How-to-Start-an-Execution-Package-from-the-Command-Line

Thanks @Christian Hauggaard, we will investigate those options. 

I’m curious if our version of TimeXtender holding us back? Would newer versions of TX or running the ODX server give us additional options to meet our goals? Specifically skipping the execution if a new file is not present.

Userlevel 6
Badge +5

@AndrewS ODX server would handle this automatically. In the ODX server, you would have to use the Cdata CSV provider (as the multiple text file data source only applies to Business Units). I setup a CSV provider with the following settings in my ODX server, 

Then I executed my transfer task and my DW table. Then I removed the file in my folder and executed the transfer task and the DW table again. The transfer task completes with errors.

However the DW table when executed still contains data

Thank you Christian.

Reply