Skip to main content

Hello community,

 

Is it possible to do RegEx with TimeXtender? Or use Python code somewhere in TX that supports the library re? 

Our business case:
-  We have a database with text reviews of varying lengths, where personal information is sometimes available. We want to read the database in and anonimise the values which contain personally identifiable information. 

For example: "oh no, ZuzaGlog doesn't know how TimeXtender works” and we want to make it "oh no, XXXXXX doesn't know how TimeXtender works”.

 

We currently do it in Python with the library re, and a list of possible personally identifiable information. We search through the strings with reviews using the keywords from the list, and replace hits with XXXXXX. 
Now we want to build this solution in TimeXtender. 

In what way would this be possible?

Hi @ZuzaGlog ,

if you are doing this interactively using code in Python then your Python application's output is basically a data source. In principle, what you are doing borders on something I would do in a business process and not in a data platform. You might consider using a Master Data Management tool to deal with the fuzzy logic required.

That being said, you do have options: the LIKE clause gives some RegEx-like support, see: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver16 . If you are not using Azure SQL DB, you can add a CLR runtime that gives you RegEx capabilities: https://techcommunity.microsoft.com/t5/modernization-best-practices-and/sql-server-regular-expressions-library-sample/ba-p/3101875 . Full regular expression support is on the roadmap for SQL Server, but has not been released yet.

You can also call your Python code from PowerShell and weave that into your execution process. Finally, you could develop your own connector in C# that includes this kind of logic.


Hi @ZuzaGlog 

It is a good issue. With GDPR and everything.

I see this as more of an idea. I will convert this to that, but I would like some more info about how you want it to work and why you need it.

So how you would like it to behave in TimeXtender? I feel like the why you want it is added, but it could probable be expanded a bit. You mention python being an option. I would like more how you could imagine it working when you want to mask/change private data in a row/table. It does not need to be complicated to do this.

 


Reply