..with the ability to run data from it again on demand.
Consider the scenario where an ERP system is upgraded/replaced with a new one. Historical transaction data will not be migrated to the new system. Most but not all dimension data (customers, products, divisions etc.) are created in the new system. Data from the old ERP db will still be accessible.
In TX we add a new source, new staging tables, and new cleansing rules. We map the new source to existing dimensions and facts. The old source is still mapped as well. Dimensions have history enabled and we do a surrogate key lookup in the facts. Source system is not a part of the PK. Many of the facts are incrementally loaded.
We stop extracting data from the old system by guarding the source. The staging tables stores data from the last load (not securely though, it could easily be truncated in a future deployment if source settings change or a new TX release triggers a re-deployment?). We also should empty the old staging tables that are feeding a dimension since we need the new source to take over updating the dimension data. We don't want to remove the old source mappings of the dimensions in case we might need to run them again in the future (not all dimension data was created in the new system). Ability to disable/guard the data mapping of the old source might help?
There are different ways to deal with this but there could be a better fully thought-out way to handle this scenario in TX (it's an imminent problem for a maturing data warehouse). Since all development for old and new is contained in the same project it requires a different approach than traditional ETL development where there is more separation and manual control.
A white paper on a subject like this would be nice to have.
How to use TX to handle different known data warehouse development dilemmas.