A variety of factors can contribute to the growth of your Data Discovery Hub repository database. Depending on the cause, there are a number of remedies that you can apply.
Execution Package Logs
The Data Discovery Hub records details of every execution. These logs can begin to take up a lot of disc space, especially if your projects have execution packages that run more than once per day. Removing older logs can help reduce the repository's size, although you will not be able to restore them later.
To remove these logs, open the Tools menu, select Repository Administration , and click on the the Execution Package Logs tab:
From here, you can view the logs for all of the projects in your repository. By picking a start date, a stop date, and a project, you can control which execution packages appear in the list. You can then select and delete older logs that are no longer useful.
Very large repository size and rapid repository growth is often due to SSIS logging. To see if SSIS logging is enabled, edit your project and look to see if the Enable SSIS logging box is checked:
SSIS logging creates extremely detailed logs of the SSIS transfer process which can be useful in some advanced troubleshooting situations. However, we strongly recommend that you disable this option unless there is a pressing need to turn it on. The reason for this is that executing a single table only once can add hundreds or thousands of rows in the project repository database.
If this option is turned on in your project, SSIS logs probably consititute a majority of your project repository's memory footprint on your hard disc, and we recommend turning it off. After you uncheck this option, you will need to re-deploy any table in your project that uses SSIS for data transfer in order to fully commit the change.
SSIS log records are stored in the table "sysssislog" in your project repository database. This is a database system table that will not show up in your project or in the the repository database's list of tables when viewed in SQL Server Management Studio. Unless you have a specific reason to keep these logs, it should be safe to remove them as they are not used in the Data Discovery Hub.
In order to remove the logs, open up SSMS and run the query
TRUNCATE TABLE sysssislog
against your project repository.
The project repository contains a record of every version of every project you have ever created. Over the years, this can eventually cause the repository database to grow. However, it's not always necessary to keep very old versions of the project easily accessible, and archiving these records can save disc space.
To archive your old repository, export all of your current projects and make a backup of your repository database. Then, drop your repository database and re-create a new version of the repository through the Data Discovery Hub UI. Finally, import your projects into the new repository.
Note that if you are using multiple environments, you will need to reconfigure your environment settings in your new repository.
While the new repository takes up much less space on the disc, and certain functions may run faster, accessing older versions of your project will become more challenging after archiving the old repository. To restore to an earlier version, you would need to restore the archived database under a new name, open the version you want, export the project, and import it into your new repository. However, if restoring older versions is usually relatively uncommon, this trade off may be worth considering.