Data Profile does not always give the correct highest/lowest occurence

  • 10 January 2023
  • 1 reply

Userlevel 5
Badge +7

When using the Data Profile function on the Person.Person table in the AdventureWorks2014 database I see strange results for the highest and lowest occurence in the LastName field: Diaz is the highest and Billstrom is the lowest. In reality Abbas is the lowest when sorting by LastName ASC where count(*) == 1


I am not entirely sure that 'Highest occurence’ is clear to everyone that is not a natvie speaker: you could also assume this should be the highest sorted value.


Best answer by JTreadwell 10 January 2023, 22:31

View original

1 reply

Userlevel 3
Badge +5

Hi @rory.smith, In the Data Profile feature highest and lowest occurrence is indicating which value in a given field occurs the most, and which occurs the least. In SQL this would look like:

SELECT [LastName], Count([LastName])
FROM [MDW].[Person]
GROUP BY [LastName]
ORDER BY Count([LastName]) DESC

In my version of AdventureWorks, the Person table results in Diaz occurring the most at 211 times. 

If you reverse the Order By ASC, I get Barlow, then Billstrom preceding 687 other names that only occur once.  Beyond the count, I don’t believe there is any additional sorting happening, so the 687 single occurrences may appear in a different order. 


Using this logic, it seems to be working as expected.