What is a spatial join?
Basically it's about geometry/spatial relations, operations like e.g. Is point P on the line L? Or How far is the distance between two points P1, P2? - think 8th grade math.
Examples for spatial relations (https://en.wikipedia.org/wiki/Spatial_relation)
Why would you want a spatial join?
Practical applications are things like Which country does this GPS coordinate belong to? (a.k.a. reverse geocoding) Translation: Is point P (GPS coordinate) contained within shape S (country outline)? - a spatial relation!
Bringing this to database language think of dimension table DimCountries that contains the outlines of all countries along with country names, and another table FactDeliveries that contains GPS points of e.g. deliveries.
And to have a proper star schema we would want to have a CountryId in the FactDeliveries table with a reference to the DimCountries table, based on which country shape contains the delivery GPS coordinates.
Can SQL Server/T-SQL do this?
In T-SQL there's two data types geography and geometry that can be used to store such kind of data, and they also provide methods to relate different geometries together like Contains, Intersect, Distance, and many many more.
You can even plot the shapes in SQL Management Studio, which can be very helpful:
So you can write a query like this:
FROM FactDeliveries deliveries
INNER JOIN DimCountries countries
ON countries.CountryOutline.STContains(deliveries.GPSCoordinates) = 1
In the last line of the statement you see the two tables joined together based on the spatial relation of a country outline and a GPS coordinate that is compared to the shapes. The result is that for each address the containing country is found.
This of course is just an example, you can do this also on level of streets, neighborhoods, cities, communities, counties, continents, oceans, and also on artificial groupings like NAFTA, Nordics, UK&I, DACH, etc. - as long as you have the right reference data.
Also, there's many other potential use cases for spatial joins, like assign the nearest service provider, create a list of potential customers in a particular area, identify clusters of occurences within a certain geographical area, to to just name a few.
How to do this in TX?
TimeXtender does not support spatial joins as native relationships between tables.
Such relationships are computationally expensive, so they should be used carefully.
Prepare sample data
To follow the example along please follow these steps:
- download the sample database and restore it on a SQL server.
- Make sure, you have SQLSysCLRTypes installed so geography and geometry data types are available.
- Use the SQL server source to load the data to your ODX.
- Create a MDW and move both tables Countries and Deliveries into your MDW.
Align column types for spatial join
First, note that in the Countries table the Country Outline is of type Geometry, while GpsCoordinates in Deliveries is of type Geography.
In order to make a successful join both need to be of the same type.
Create a new field in Deliveries called GpsCoordinatesGeometry and make sure the field type is Geometry.
Create a Field Transformation on the field of type Custom value and insert this custom SQL:
Make sure the parameter is mapped:
Create a custom view with the actual spatial join
Next, create a Custom View with this code:
CREATE VIEW [etl].[MapDeliveriesCountries]
, c.[DW_Id_Countries] CountrySurrogateKey
ON c.[CountryOutline].STContains(d.[GpsCoordinatesGeometry]) = 1
Again, make sure, all parameters are mapped:
Now it’s time to deploy (deploy only, execute is not necessary at this point).
After deployment, right click the new view and select Read View Fields:
Prepare Deliveries table with additional columns
Next, create two fields in the Deliveries table to hold the CountryNameLong and the CountrySurrogateKey, the Foreign Key to the Countries table for the relationship:
Create the following fields:
- Name: CountrySurrogateKey, type: bigint
- Name: CountryNameLong, type: varchar, length 100
Script Action to fill in attributes in Deliveries table
Next we need a Custom Step Script Action
Name it UpdateDeliveriesCountryIds and insert this Custom SQL:
[CountrySurrogateKey] = c.[CountrySurrogateKeyMap]
, [CountryNameLong] = c.[CountryNameLongMap]
Again, make sure that all parameters are mapped:
Last step: set Script Action as Post Script
Last step is to set the Script Action as a post script for the Deliveries tables:
Now it’s time to deploy and execute.
Note how the update takes pretty long for the small amount of data in the execution – spatial joins are computationally expensive operations!
Now we can use the Query Tool to see the results, e.g. by executing a query like this:
count(*) As CntDeliveries
GROUP BY [CountrySurrogateKey],
ORDER BY 3 DESC
It shows most Deliveries happened in the US, but also that 3,357 deliveries couldn’t be mapped to a country in this example.
This can happen because of inaccurate GPS coordinates, or because of lacking reference data in the Countries table – GIS/geo computing can be super complex and is a whole world of it’s own.