spectrum location intelligence - pitney bowes · 2020-01-28 · location intelligence for big data...
TRANSCRIPT
Spectrumtrade Location Intelligence for Big Data Version 40
Spectrumtrade Location Intelligence for Big Data User Guide
Table of Contents
1 - Welcome
What is Spectrumtrade Location Intelligence for Big
Data 4
Spectrumtrade Location Intelligence for Big Data
Architecture 5
System Requirements and Dependencies 6
2 - Spatial
Installing the SDK 8
Hive User-Defined Spatial Functions 9
Spark 72
3 - Appendix
PGD Builder 81
Download Permissions 83
Operators and Syntax Delimiters 84
1 - Welcome
In this section
What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6
Welcome
What is Spectrumtrade Location Intelligence for Big Data
The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4
Welcome
Spectrumtrade Location Intelligence for Big Data Architecture
What is Spectrumtrade Location Intelligence for Big Data
The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive
SDK provides
bull Integration APIs for Location Intelligence bull Input datasets and metadata
API Types
bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Table of Contents
1 - Welcome
What is Spectrumtrade Location Intelligence for Big
Data 4
Spectrumtrade Location Intelligence for Big Data
Architecture 5
System Requirements and Dependencies 6
2 - Spatial
Installing the SDK 8
Hive User-Defined Spatial Functions 9
Spark 72
3 - Appendix
PGD Builder 81
Download Permissions 83
Operators and Syntax Delimiters 84
1 - Welcome
In this section
What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6
Welcome
What is Spectrumtrade Location Intelligence for Big Data
The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4
Welcome
Spectrumtrade Location Intelligence for Big Data Architecture
What is Spectrumtrade Location Intelligence for Big Data
The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive
SDK provides
bull Integration APIs for Location Intelligence bull Input datasets and metadata
API Types
bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
1 - Welcome
In this section
What is Spectrumtrade Location Intelligence for Big Data 4 Spectrumtrade Location Intelligence for Big Data Architecture 5 System Requirements and Dependencies 6
Welcome
What is Spectrumtrade Location Intelligence for Big Data
The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4
Welcome
Spectrumtrade Location Intelligence for Big Data Architecture
What is Spectrumtrade Location Intelligence for Big Data
The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive
SDK provides
bull Integration APIs for Location Intelligence bull Input datasets and metadata
API Types
bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Welcome
What is Spectrumtrade Location Intelligence for Big Data
The Pitney Bowes Spectrumtrade Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis Billions of records can be processed in parallel using MapReduce Hive and Apache Sparks cluster processing framework yielding results faster than ever Unlike traditional processing techniques that used to take weeks to process the data now the data processing can be done in a few hours using this product
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 4
Welcome
Spectrumtrade Location Intelligence for Big Data Architecture
What is Spectrumtrade Location Intelligence for Big Data
The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive
SDK provides
bull Integration APIs for Location Intelligence bull Input datasets and metadata
API Types
bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Welcome
Spectrumtrade Location Intelligence for Big Data Architecture
What is Spectrumtrade Location Intelligence for Big Data
The Spectrumtrade Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark MapReduce and Hive
SDK provides
bull Integration APIs for Location Intelligence bull Input datasets and metadata
API Types
bull Pre-built Spark and Hive UDF wrappers for Location Intelligence operations bull Core Location Intelligence APIs with sample MapReduce Hive and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 5
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Welcome
System Requirements and Dependencies
Spectrumtrade Location Intelligence for Big Data is a collection of jar files that can be deployed to your Hadoop system
This product is verified on the following Hadoop distributions
bull Cloudera 512 and 6x bull Hortonworks 3x bull EMR 510 and 520
To use these jar files you must be familiar with configuring Hadoop in Hortonworks Cloudera or EMR and developing applications for distributed processing For more information refer to Hortonworks Cloudera or EMR documentation
To use the product the following must be installed on your system
for Hive
bull Hive version 121 or above
for Hive Client
bull Beeline for example
for Spark and Zeppelin Notebook
bull Java JDK version 18 or above bull Hadoop version 260 or above bull Spark version 20 or above bull Zeppelin Notebook is not supported in Cloudera
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 6
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
2 - Spatial This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis The LI SDK provides geometry and coordinate operations the ability to read TAB files and in-memory r-tree creation and searching
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive
In this section
Installing the SDK 8 Hive User-Defined Spatial Functions 9 Spark 72
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Installing the SDK
To use spatial functions for Spectrumtrade Location Intelligence for Big Data the Hadoop cluster must have reference data and libraries accessible from the master node
For the purposes of this guide we will
bull use a user called pbuser bull install everything into pb
Perform the following steps from a node in your cluster such as the master node
1 Create the install directory and give ownership to pbuser
sudo mkdir pbsudo chown pbuserpbuser pb
2 Add the Location Intelligence distribution zip to the node at a temporary location for example
pbtempspectrum-bigdata-locationintelligence-versionzip
3 Extract the Location Intelligence distribution
mkdir pblimkdir pblisoftwareunzip pbtempspectrum-bigdata-locationintelligence-versionzip -d pblisoftware
4 Create an install directory on hdfs and give ownership to pbuser
sudo -u hdfs hadoop fs -mkdir -p hdfspbli sudo -u hdfs hadoop fs -chown -R pbuserpbuser hdfspb
5 Upload the distribution into HDFS
hadoop fs -copyFromLocal pblisoftware hdfspbli
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 8
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code Spectrumtrade Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the spectrum-bigdata-li-hive-ltversiongtjar
Refer to the table below to quickly navigate to Hive UDFs described in this document
Type Description Name
Constructor Functions on page 15 Construct an instance of WritableGeometry FromGeoJSON
from supported geometry representation FromKML
formats
FromWKB
FromWKT
ST_Point
Grid Functions on page 53 Grid processing functions GeoHashBoundary
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Measurement Functions on page 29 Geometry measurement functions Area
ClosestPoints
Distance
Length
Perimeter
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 9
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Type Description Name
Observer Functions on page 46 Geometry observer functions ST_X
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
Persistence Functions on page 20 Serialize an instance of WritableGeometry ToGeoJSON
to supported geometry representation ToKML
formats
ToWKB
ToWKT
Predicate Functions on page 24 Geometry predicate functions Disjoint
Intersects
Overlaps
Within
IsNullGeometry
Processing Functions on page 39 Geometry processing functions Buffer
ConvexHull
Intersection
Transform
Union
Search Functions on page 66 Spatial search functions LocalPointInPolygon
LocalSearchNearest
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 10
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Setup
This topic assumes the product is installed to pblisoftware as described in Installing the SDK on page 8 To set up user-defined spatial functions for Hive perform the following steps
1 Proceed according to your platform
On this Do this platform
Cloudera Copy the Hive jar for Location Intelligence to the HiveServer node
pblisoftwarehivelibspectrum-bigdata-li-hive-versionjar
In Cloudera Manager navigate to the Hive Configuration page Search for the Hive Auxiliary JARs Directory setting If the value is already set then move the Hive jar into the specified folder If the value is not set then set it to the parent folder of the Hive jar
pblisoftwarehivelib
Hortonworks On the HiveServer2 node create the Hive auxlib folder if one does not already exist
sudo mkdir usrhdpcurrenthive-server2auxlib
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node
sudo cp pblisoftwarehivelibspectrum-bigdata-li-hive-versionjarusrhdpcurrenthive-server2auxlib
2 Restart all Hive services 3 Launch Beeline or some other Hive client for the remaining step
beeline -u jdbchive2localhost10000default -n pbuser
4 Register spatial user-defined functions Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session)
create temporary function FromWKT as compbbigdataspatialhiveconstructFromWKT create temporary function FromWKB as compbbigdataspatialhiveconstructFromWKB create temporary function FromKML as compbbigdataspatialhiveconstructFromKML create temporary function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON create temporary function ST_Point as compbbigdataspatialhiveconstructST_Point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 11
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
create temporary function ToWKT as compbbigdataspatialhivepersistenceToWKT create temporary function ToWKB as compbbigdataspatialhivepersistenceToWKB create temporary function ToKML as compbbigdataspatialhivepersistenceToKML create temporary function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
create temporary function Disjoint as compbbigdataspatialhivepredicateDisjoint create temporary function Overlaps as compbbigdataspatialhivepredicateOverlaps create temporary function Within as compbbigdataspatialhivepredicateWithin create temporary function Intersects as compbbigdataspatialhivepredicateIntersectscreate temporary function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
create temporary function Area as compbbigdataspatialhivemeasurementAreacreate temporary function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints create temporary function Distance as compbbigdataspatialhivemeasurementDistance create temporary function Length as compbbigdataspatialhivemeasurementLength create temporary function Perimeter as compbbigdataspatialhivemeasurementPerimeter
create temporary function ConvexHull as compbbigdataspatialhiveprocessingConvexHull create temporary function Intersection as compbbigdataspatialhiveprocessingIntersection create temporary function Buffer as compbbigdataspatialhiveprocessingBuffer create temporary function Union as compbbigdataspatialhiveprocessingUnion create temporary function Transform as compbbigdataspatialhiveprocessingTransform
create temporary function ST_X as compbbigdataspatialhiveobserverST_X create temporary function ST_XMax as compbbigdataspatialhiveobserverST_XMax create temporary function ST_XMin as compbbigdataspatialhiveobserverST_XMin create temporary function ST_Y as compbbigdataspatialhiveobserverST_Y create temporary function ST_YMax as compbbigdataspatialhiveobserverST_YMax create temporary function ST_YMin as compbbigdataspatialhiveobserverST_YMin
create temporary function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary create temporary function GeoHashID as compbbigdataspatialhivegridGeoHashID create temporary function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary create temporary function HexagonID as compbbigdataspatialhivegridHexagonIDcreate temporary function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundarycreate temporary function SquareHashID as compbbigdataspatialhivegridSquareHashID
create temporary function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearestcreate temporary function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Note If you want to view the complete stack trace for any encountered error enable logging in DEBUG mode and then restart the job execution
bull The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3 It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3 If you are using Hive with the MapReduce engine you can adjust the value of the mapreducetasktimeout property
bull Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints To process these queries we recommend increasing the amount of memory available to the HiveServer2 process (for example by setting HADOOP_HEAPSIZE in hive-envsh)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 12
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
WritableGeometry
This is an implementation of Hadoops Writable interface for geometry
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT WKB GeoJSON and KML For example
To get an instance of WritableGeometry from WKT
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
To get an instance of WritableGeometry from WKB string
SELECT FromWKB(tgeometryepsg4267) FROM hivetable t
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT WKB GeoJSON and KML For example
To serialize an instance of WritableGeometry to WKT
SELECT ToWKT(tgeometry) FROM hivetable t
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it For example
To calculate the length of a geometry
SELECT Length(FromWKT(tgeometry epsg4267) m SPHERICAL) FROM hivetable t
To get the distance between two geometries
SELECT Distance(FromWKT(tgeometryepsg4267) FromWKT(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry The IsNullGeometry function returns true if the geometry is NULL or empty otherwise it returns false The result type is a boolean
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
For more information see httpshadoopapacheorgdocsr311apiorgapachehadoopioWritablehtml
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 13
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Geometry Functions
bull Constructor Functions bull Persistence Functions bull Predicate Functions bull Measurement Functions bull Processing Functions bull Observer Functions bull Grid Functions
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 14
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Constructor Functions
The following Constructor functions are available
bull FromGeoJSON bull FromWKT bull FromWKB bull FromKML bull Point
FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry
Function Registration
create function FromGeoJSON as compbbigdataspatialhiveconstructFromGeoJSON
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Parameter Type Description
jsonGeometry String The geometry in geoJSON format
Return Values
Return Type Description
WritableGeometry The geometry from geoJSON format
Examples
SELECT FromGeoJSON( type Point coordinates [1000 00] )
SELECT FromGeoJSON(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 15
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry
Function Registration
create function FromWKT as compbbigdataspatialhiveconstructFromWKT
Syntax
fromWKT(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The geometry in WKT format
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKT format
Examples
SELECT FromWKT(tgeometry) FROM hivetable t
SELECT FromWKT(tgeometryepsg4267) FROM hivetable t
SELECT FromWKT (POINT (30 20) epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 16
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry
Function Registration
create function FromWKB as compbbigdataspatialhiveconstructFromWKB
Syntax
fromWKB(String geometry [SpatialInfo CRS])
Parameters
Parameter Type Description
geometry String The WKB of the geometry in byte array format (byte[ ])
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry from WKB format
Examples
SELECT FromWKB(unhex(010100000000000000000024400000000000002440) epsg4326)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 17
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language)
Function Registration
create function FromKML as compbbigdataspatialhiveconstructFromKML
Syntax
fromKML(String geometry)
Parameters
Parameter Type Description
geometry String A KML string where only the geometry or geometry in placemark will be parsed
Return Values
Return Type Description
WritableGeometry The geometry from KML format
Examples
SELECT FromKML(tgeometry) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 18
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
X
Spatial
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y and an optional CRS
Function Registration
create function ST_Point as compbbigdataspatialhiveconstructST_Point
Syntax
ST_Point(String|Number X String|Number Y [SpatialInfo CRS])
Parameters
Parameter Type Description
String or Number The X ordinate
Y String or Number The Y ordinate
CRS String Optional The coordinate system for the geometry Default = EPSG4326
Return Values
Return Type Description
WritableGeometry The geometry of the specified XY coordinates If any of the argument values are invalid then an empty geometry will be returned in the output
Examples
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103 epsg4326)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(-73750333 42736103)
SELECT ST_Point(px py pcrs) FROM points p
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 19
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Persistence Functions
The following Persistence functions are available
bull ToGeoJSON bull ToWKT bull ToWKB bull ToKML
ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance
Function Registration
create function ToGeoJSON as compbbigdataspatialhivepersistenceToGeoJSON
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
GeoJSON String The GeoJSON representation of a geometry
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROMhivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 20
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance
Function Registration
create function ToWKT as compbbigdataspatialhivepersistenceToWKT
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The WKT representation of a geometry
Examples
SELECT ToWKT(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 21
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance
Function Registration
create function ToWKB as compbbigdataspatialhivepersistenceToWKB
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
Byte[ ] The WKB representation of a geometry expressed as a byte array
Examples
SELECT ToWKB(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 22
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard KML22 namespace (httpschemasopengisnetkml220ogckml22xsd) as parsed from the specified WritableGeometry instance
Function Registration
create function ToKML as compbbigdataspatialhivepersistenceToKML
Syntax
ToKML(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The instance of a WritableGeometry
Return Values
Return Type Description
String The KML representation of a geometry expressed as a hexadecimal encoded string
Examples
SELECT ToKML(Buffer(FromGeoJSON(tgeometry) 50 km 12 SPHERICAL )) FROM hivetablet
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 23
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Predicate Functions
The following Predicate functions are available
bull Disjoint bull Intersects bull Overlaps bull Within bull IsNullGeometry
Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common
Function Registration
create function Disjoint as compbbigdataspatialhivepredicateDisjoint
Syntax
Disjoint(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the two geometry objects have no points in common otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Disjoint(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 24
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object
Function Registration
create function Intersects as compbbigdataspatialhivepredicateIntersects
Syntax
Intersects(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if there is any direct position in common between the two geometries otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Intersects(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
SELECT Rhighway FROM USA_RIVERS L usa_highways R WHERE Lname=Hudson River ANDIntersects(FromWKT(Lgeom) FromWKT(Rgeom))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 25
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object
Function Registration
create function Overlaps as compbbigdataspatialhivepredicateOverlaps
Syntax
Overlaps(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry1 overlaps the geometry2 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Overlaps(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 26
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object
Function Registration
create function Within as compbbigdataspatialhivepredicateWithin
Syntax
Within(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
Boolean True if the geometry2 entirely contains geometry1 otherwise False
If either geometry1 or geometry2 are null Null is returned
Examples
SELECT Within(FromWKT(t1geometry epsg4326) FromWKT(t2geometry epsg4326)) ASResult FROM hivetable1 t1 hivetable2 t2
SELECT Lzipcode as zipcode SUM(Linsurance) as TotalInsuredAmount AVG(Rriskdesc) AS RiskScore
FROM book_of_business L FIRE_RISK_BOUNDRIES RWHERE Within(FromWKT(Llocation) FromWKT(Rgeom)) GROUP BY Lzipcode
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 27
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry
Function Registration
create function IsNullGeometry as compbbigdataspatialhivepredicateIsNullGeometry
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Parameter Type Description
inputGeometry WritableGeometry The input geometry to be checked for a null or empty value
Return Values
Return Type Description
Boolean True if the geometry is null or empty otherwise False
Examples
SELECT IsNullGeometry(null)
SELECT IsNullGeometry(FromWKT(POINT(10 20)))
SELECT IsNullGeometry(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 28
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Measurement Functions
The following Measurement functions are available
bull Area bull ClosestPoints bull Distance bull Length bull Perimeter
Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit The unit must be specified as a parameter while calling the function The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings Points and curves have zero area
Function Registration
create function Area as compbbigdataspatialhivemeasurementArea
Syntax
Area(WritableGeometry geometry String areaUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
areaUnits String The desired return unit type For valid values see Area Units on page 30
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 29
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Area Units
Valid values for unit are the following area units
Value Description
sq mi square miles
sq km square kilometers
sq in square inches
sq ft square foot
sq yd square yards
sq mm square millimeters
sq cm square centimeters
sq m square meters
sq survey ft square US Survey feet
sq nmi square nautical miles
acre acres
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 30
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Value Description
ha hectares
Return Values
Return Type Description
Double The area of the geometry
Examples
SELECT Area(FromWkt(tgeometryepsg4267) sq mi SPHERICAL) FROM hivetable t
SELECT Area(FromWkt(tgeometryepsg4267) sq mi) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 31
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries
Function Registration
create function ClosestPoints as compbbigdataspatialhivemeasurementClosestPoints
Syntax
ClosestPoints(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
ArrayltWritableGeometrygt The closest points between the two geometries The geometries that intersect are at distance zero from each other and in this case a shared point is returned
Examples
SELECT ToWKT(resultpoints[0]) AS point1 ToWKT(resultpoints[1]) AS point2 FROM hivetablet LATERAL VIEW OUTER inline(array(named_struct(points ClosestPoints(FromWKT(tgeometry1epsg4326) FromWkt(tgeometry2 epsg4326) SPHERICAL)))) result
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 32
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Distance
Description
The Distance function calculates and returns the distance between two geometries
Function Registration
create function Distance as compbbigdataspatialhivemeasurementDistance
Syntax
Distance(WritableGeometry geometry1 WritableGeometry geometry2 String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
linearUnits String The desired return unit type For valid values see Linear Units on page 33
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
The following table lists the valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 33
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The distance between the two geometries Geometries that intersect are at distance zero from each other Distance is always non-negative
Examples
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) mSPHERICAL) FROM hivetable t
SELECT Distance(FromWkt(tgeometryepsg4267) FromWkt(tgeometry2epsg4267) m)FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 34
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type
Function Registration
create function Length as compbbigdataspatialhivemeasurementLength
Syntax
Length(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 35
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 35
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The length of the geometry
Examples
SELECT Length(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Length(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 36
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes) The curves are considered as thin polygons
Function Registration
create function Perimeter as compbbigdataspatialhivemeasurementPerimeter
Syntax
Perimeter(WritableGeometry geometry String linearUnits [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 37
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 37
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Return Values
Return Type Description
Double The perimeter of the geometry
Examples
SELECT Perimeter(FromWkt(tgeometryepsg4267) m SPHERICAL) FROM hivetable t
SELECT Perimeter(FromWkt(tgeometryepsg4267) m) FROM hivetable t
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 38
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Processing Functions
The following Processing functions are available
bull Buffer bull ConvexHull bull Intersection bull Transform bull Union
Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object
Function Registration
create function Buffer as compbbigdataspatialhiveprocessingBuffer
Syntax
Buffer(WritableGeometry geometry Number offset String linearUnit Number resolution [String computationType])
Parameters
Parameter Type Description
geometry WritableGeometry The geometry to buffer
offset Number The distance from the input geometry
linearUnits String The desired return unit type For valid values see Linear Units on page 40
resolution Number Specifies how many straight-line segments are used in approximating a circle For example with a resolution of 8 the buffer of a point will be an octagon Buffers with larger resolution values take more time and space to compute
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 39
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Parameter Type Description
computationType String Optional Indicates the logic to be used to interpret geometry coordinates The computation type is based on the coordinate system of the geometry being operated on
bull For geographic (longlat) coordinate systems Valid type = SPHERICAL (default)
bull For projected coordinate systems Valid types = CARTESIAN SPHERICAL (default)
bull For engineering coordinate systems Valid type = CARTESIAN (default)
CARTESIAN The geometry coordinates are interpreted using cartesian logic
SPHERICAL The geometry coordinates are interpreted using spherical logic
Linear Units
Valid values for unit type
Value Description
mi miles
km kilometers
in inches
ft feet
yd yards
mm millimeters
cm centimeters
m meters
survey ft US Survey feet
nmi nautical miles
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 40
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Return Values
Return Type Description
WritableGeometry A geometry consisting of all the points within the offset distance of the input geometry
Examples
SELECT Buffer(FromWKT(tgeometryepsg4267) 50 km 12 SPHERICAL ) FROM hivetablet
SELECT Buffer(ST_POINT(5 6 epsg4267) 50 km 12 SPHERICAL )
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 41
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry The convex hull is the smallest convex geometry that contains all the points in the input geometry
Function Registration
create function ConvexHull as compbbigdataspatialhiveprocessingConvexHull
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
WritableGeometry The convex hull of the geometry
Examples
SELECT ConvexHull(FromWKT(geometry epsg4326)) FROM hivetable
SELECT ToWKT(ConvexHull(FromWKT(tablegeometryepsg4267))) AS result FROM hivetable
SELECT ConvexHull(FromWKT(MULTIPOLYGON (((40 40 20 45 45 30 40 40)) ((20 35 10 3010 10 30 5 45 20 20 35) (30 20 20 15 20 25 30 20))))epsg4267)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 42
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Intersection
Description
The Intersection function is a geometry (point line or curve) common in two geometry objects (such as lines curves planes and surfaces) It returns the geometry consisting of direct positions that lie in both specified geometries
Function Registration
create function Intersection as compbbigdataspatialhiveprocessingIntersection
Syntax
Intersection(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry formed from the direct positions that are common to both input geometries
Examples
SELECT Intersection(FromWKT(t1geometryepsg4267) FromWKT(t2geometryepsg4267))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 43
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another
Function Registration
create function Transform as compbbigdataspatialhiveprocessingTransform
Syntax
Transform(WritableGeometry geometry String CRS)
Parameters
Parameter Type Description
geometry WritableGeometry The source input geometry
CRS String The destination coordinate system for the geometry
Return Values
Return Type Description
WritableGeometry The geometry transformed to the destination coordinate system
Examples
SELECT Transform(FromWKT(tgeometryepsg4326) epsg3857) FROM hivetable t
SELECT Transform(ST_POINT(30 20)epsg3857)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 44
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects
Function Registration
create function Union as compbbigdataspatialhiveprocessingUnion
Syntax
Union(WritableGeometry geometry1 WritableGeometry geometry2)
Parameters
Parameter Type Description
geometry1 WritableGeometry The first instance of a WritableGeometry
geometry2 WritableGeometry The second instance of a WritableGeometry
Return Values
Return Type Description
WritableGeometry The geometry that represents the union of the input geometries
Examples
SELECT Union(FromWKT(geometry1 epsg4326) FromWKT(geometry2 epsg4326)) FROMhivetable
SELECT ToWKT(Union(FromWKT(t1geometryepsg4267)FromWKT(t2geometryepsg4267)))FROM hivetable1 t1 hivetable2 t2
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 45
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables For example the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another
Another common need is the ability to filter records in an XY table by the bounds of a geometry The ST_XMax ST_XMin ST_YMax and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry
The following Observer index functions are available
bull ST_X bull ST_XMax bull ST_XMin bull ST_Y bull ST_YMax bull ST_YMin
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 46
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_X as compbbigdataspatialhiveobserverST_X
Syntax
ST_X(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_X(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 47
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMax as compbbigdataspatialhiveobserverST_XMax
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 48
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_XMin as compbbigdataspatialhiveobserverST_XMin
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The X minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_XMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 49
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Function Registration
create function ST_Y as compbbigdataspatialhiveobserverST_Y
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y ordinate of the geometry if the geometry is a point or Null if the geometry is not a point or is null
Examples
SELECT ST_Y(ST_Point(x y epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 50
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMax as compbbigdataspatialhiveobserverST_YMax
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y maxima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMax(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 51
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry or Null if the specified value is not a geometry
Function Registration
create function ST_YMin as compbbigdataspatialhiveobserverST_YMin
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Parameter Type Description
geometry WritableGeometry The input geometry
Return Values
Return Type Description
Double The Y minima of the input geometry or Null if the specified value is not a geometry
Examples
SELECT ST_YMin(FromWKT(hellip epsg4326)) FROM src
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 52
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between This makes grids very useful for spatial indexing and aggregating
Spectrumtrade Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier
We provide three types of UDFs for processing three grid cell shapes
bull rectangular (geohash) bull hexagon (hexagon hash) bull square (square hash)
Hashes are useful for analysis and interoperability with other systems
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator the cells appear as squares
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps
The following Grid index functions are available
bull GeoHashBoundary bull GeoHashID bull HexagonBoundary bull HexagonID bull SquareHashBoundary bull SquareHashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 53
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is rectangular
Function Registration
create function GeoHashBoundary as compbbigdataspatialhivegridGeoHashBoundary
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable
SELECT GeoHashBoundary(ebvnk)
Syntax
GeoHashBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
54
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT GeoHashBoundary(x y precision) FROM hivetable
SELECT GeoHashBoundary(-73750333 42736103 3)
SELECT GeoHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 55
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
X
Spatial
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function GeoHashID as compbbigdataspatialhivegridGeoHashID
Syntax
GeoHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The geohash ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 56
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Examples
SELECT GeoHashID(x y precision) FROM hivetable
SELECT GeoHashID(-73750333 42736103 3)
SELECT GeoHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (GeoHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(GeoHashBoundary (chashID)) count () as quantity FROM (SELECT GeoHashID(x y 10) AS hashID FROM coordinates) c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 57
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision The shape of the cell is a hexagon
Function Registration
create function HexagonBoundary as compbbigdataspatialhivegridHexagonBoundary
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable
SELECT HexagonBoundary(PF625028642)
Syntax
HexagonBoundary(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide
X
58
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Parameter Type Description
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT HexagonBoundary(x y precision) FROM hivetable
SELECT HexagonBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 59
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
X
Spatial
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell The ID then is sortable and searchable that corresponds to the specified X Y and precision
Function Registration
create function HexagonID as compbbigdataspatialhivegridHexagonID
Syntax
HexagonID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 60
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Examples
SELECT HexagonID(x y precision) FROM hivetable
SELECT HexagonID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (HexagonID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(HexagonBoundary(chashID))count () AS quantity FROM (SELECT HexagonID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 61
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
SquareHashBoundary
Description
The SquareHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location It also can return the boundary of the cell that contains the given point at the specified precision Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashBoundary as compbbigdataspatialhivegridSquareHashBoundary
Syntax
SquareHashBoundary(String UNIQUE_ID)
Parameters
Parameter Type Description
UNIQUE_ID String The unique geohash identifier of a cell in a grid
Return Values
Return Type Description
WritableGeometry A representation of the boundary of a cell in a grid
Examples
SELECT SquareHashBoundary(hashStringId) FROM hivetable
SELECT SquareHashBoundary(03332)
Syntax
SquareHashBoundary(Number|String X Number|String Y Number precision)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 62
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
X
Spatial
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
WritableGeometry The boundary of the grid cell at the given precision that the point falls into
Examples
SELECT SquareHashBoundary(x y precision) FROM hivetable
SELECT SquareHashBoundary(-73750333 42736103 3)
SELECT SquareHashBoundary(-73750333 42736103 3)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 63
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
X
Spatial
SquareHashID
Description
The SquareHashID function takes a longitude latitude (in WGS 84) and a precision The precision determines how large the grid cells are (higher precision means smaller grid cells) It returns the string ID of the grid cell at the specified precision that contains the point Square hash cells appear square when displayed on a Popular Mercator map
Function Registration
create function SquareHashID as compbbigdataspatialhivegridSquareHashID
Syntax
SquareHashID(Number|String X Number|String Y Number precision)
Parameters
Parameter Type Description
Number or String The longitude value of the point
Y Number or String The latitude value of the point
precision Number The length of the string key to be returned The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells)
Return Values
Return Type Description
String The ID of the grid cell at the specified precision that contains the point
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 64
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Examples
SELECT SquareHashID(x y precision) FROM hivetable
SELECT SquareHashID(-73750333 42736103 3)
SELECT SquareHashID(-73750333 42736103 3)
CREATE TEMPORARY TABLE tmptbl ASSELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
INSERT INTO TABLE coordinates_with_hash SELECT (SquareHashID(x y 10)) AS hashIDFROM coordinates ORDER BY hashID
SELECT chashID ToWKT(SquareHashBoundary (chashID))count () AS quantity FROM (SELECT SquareHashID(x y 10)AS hashID FROM coordinates)c GROUP BY chashID
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 65
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Search Functions
The following Search functions are available
bull LocalPointInPolygon bull LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned by this UDTF that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Function Registration
create function LocalPointInPolygon as compbbigdataspatialhivesearchLocalPointInPolygon
Syntax
LocalPointInPolygon(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing a point
dataSourcePath String The path to the data source to be searched The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 66
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Parameter Type Description
options Map Optional Options that allow you to set return criteria in ltString Stringgt format
Options
Option Description
shpCharset the charset to use when reading a shapefile
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
Example
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions on page 83
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 67
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Return Values
Return Type Description
geometry The polygon (contained in the specified TAB or shapefile) in which an input point resides All the geometries where the search point lies within are returned that is if the search point lies on a Line Polyline Point Polygon or MultiPolygon geometry type the respective geometries will be returned in the output
Examples
Using HDFS
SELECT pip_pointsid pipresultcapital pipresultstateFROM pip_points LATERAL VIEWLocalPointInPolygon(FromWKT(pip_pointsgeometry pip_pointscrs)
STATECAPTABmap(remoteDataSourceLocation hdfsdatapipcapitalszipdownloadLocation pbpipdownload downloadGroup pbdownloads))
pipresult
In the above example id is a field from the pip_points table which is the table being used to get the points we are searching from The pipresultcapital and pipresultstate fields are from the STATECAP TAB file that we want in our query result
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 68
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point
Function Registration
create function LocalSearchNearest as compbbigdataspatialhivesearchLocalSearchNearest
Syntax
LocalSearchNearest(WritableGeometry inputPoint String dataSourcePath [map(String options)])
Parameters
Parameter Type Description
inputPoint WritableGeometry A WritableGeometry representing the point to search near
dataSourcePath String the location of the input TAB or shapefile The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node
Note If you are storing and distributing your data remotely using HDFS or S3 you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below
options map Optional Options that allow you to return more than one value return additional information or set other return criteria in ltString Stringgt format
Options
Option Description Example
maxCandidates maxCandidates 3 the maximum number of results to return (if not set the default value is 1)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 69
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Description Example
the maximum distance to maxDistance 25 search for results (if not set the default value is no limit)
the distance unit (if not set distanceUnit mi
Option
maxDistance
distanceUnit
returnDistanceColumnName
shpCharset
shpCrs the coordinate reference system to use when reading a shapefile
remoteDataSourceLocation
downloadLocation
downloadGroup
the default value is m for meters)
See the Distance on page 33 function for examples of supported distance units
the name of the column to use for returning the distance
the charset to use when reading a shapefile
the path to the directory or archive that contains the data source (required only if you are storing and distributing data remotely on HDFS or S3)
the local file system location to which resources get downloaded (required only if you are storing and distributing data remotely on HDFS or S3)
returnDistanceColumnNameMiles
shpCharset utf-8
shpCrs epsg4326
remoteDataSourceLocationhdfsdatamydatazip
downloadLocationpbdownloads
Note If you are also using Spectrumtrade Geocoding for Big Data and have already set the pbdownloadlocationHive variable then you do not need to set this option here as well
downloadGrouppbdownloads
the operating system group which should be applied to downloaded data on a local file system the default is the value from the Hive strow pbdownloadgroup(required only if you are
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 70
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Example Description Option
storing and distributing data remotely on HDFS or S3)
For more information see Download Permissions onpage 83
queryFilter search on a subset of the queryFilterName Likedata based on an attribute Park
Note For more information about defining filter expressions see Operators and Syntax Delimiters on page 84
Return Values
Return Type Description
geometry The nearest geometry or geometries contained in the specified TAB or shapefile to the input point
Examples
Using HDFS
SELECT search_pointsid nearestresultcapital nearestresultstateFROM search_points LATERAL VIEW OUTERLocalSearchNearest(FromWKT(search_pointsgeometry search_pointscrs) STATECAPTAB
map(maxCandidates 3remoteDataSourceLocation hdfsdatasearchcapitalszipdownloadLocation pbsearchdownload downloadGroup pbdownloadsqueryFilter Name Like Park)) nearestresult
In the above example id is a field from the search_points table which is the table being used to get the points we are searching from The nearestresultcapital and nearestresultstate fields are from the STATECAP TAB file that we want in our query result In this particular example the maxCandidates option limits the results to 3 records for each search point
Tip To improve performance when searching TAB files consider creating PGD (prepared geometry) index files For more information see PGD Builder on page 81
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 71
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Spark
Spark Jobs
To process large sets of data the Spark jobs are provided with Spectrum Location Intelligence and Spectrum Geocoding for Big Data
Hexagon Generator This Spark job generates the hexagons within a bounding box (for example the bounding box of the continental USA) Hexagon output can be used for map display
To create hexagons for a given bounding box
1 Modify the configuration according to the hexagons to be generated Change the bounding box coordinates and hexagon level to suit your needs See Hexagons to learn about hexagon levels
2 Deploy jar and configuration to the Hadoop cluster 3 Start the Spark job using the following command
spark-submit --class compbbigdataspatialhexsparkappHexGenDriver--master yarn --deploy-mode cluster --name ltAPPLICATION_NAMEgtdironserverspectrum-bigdata-li-spark2-hexgen-versionjar -output dironhdfsoutput -conf ltCONFIG_PATHgt -overwrite
The output of the HexGenerator is a list of WKT that represents the hexagons See Consuming Results for how to use the output
Sample Output
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 72
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Hexagons
A hexagon is an effective way to represent data related to circular wave propagation such as cell tower strength or noise pollution Closely approximating circles hexagons include edge data better than if rectangles were used Hexagons also fit together well across a space
Spectrumtrade Location Intelligence for Big Data provides API for assigning locations to hexagons and aggregating the data in the hexagons for further analysis The compbhadoopcorehex package contains classes for working with hexagons and retrieving information about them Refer to Geohash Aggregation for details about using the API
The API provides an interface that assigns a hexagon and ID to each location and that ID is used to aggregate the data associated with the hexagon
One important hexagon parameter is the hexagon level This along with the longitude and latitude of a record is used to get the hexagon or its ID for the location
The hexagon level refers to a hierarchy of hexagons that divide the earths surface Level 1 refers to the whole earth Subsequent levels divide the previous level evenly into smaller units The smaller the number the higher the level and the larger the hexagon size These hexagons form a fixed network with each hexagon having a specific unique identifier (ID)
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 73
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Spectrumtrade Location Intelligence for Big Data supports Levels 1 through 11 with level 9 as the default Level 9 consists of hexagons with an edge distance of approximately 56 meters at the equator It will generate each hexagon with a unique and consistent ID at a given level for the same longitudelatitude
Spark API
The Spark2 jar contains both the Spark API and LI SDK API Assuming that the product was installed to pblisoftware as described in Installing the SDK on page 8 the jar filename and path is
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
For installation instructions see Location Intelligence Jar for Spatial Operations on page 77
Javadocs and Scaladocs are provided in the Location Intelligence SDK distribution and are also available on the Spectrum Spatial for Big Data documentation landing page
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 74
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
JoinByDistance
Description
joinByDistance is an implicit method which joins two dataframes taking longitude and latitude values one set from each dataframe representing the location of the records to be joined This method can be used to enrich a CSV containing point data with attributes associated with points within some max distance for example finding all the POIs within half of mile of each of your customers
Syntax
import compbbigdatalisparkapiSpatialImplicits_
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int) DataFrame
joinByDistance(df2 DataFrame df1Longitude Column df1Latitude Column df2Longitude Column df2Latitude Column maxDistance Length geohashPrecision Int options Map[DistanceJoinOptionDistanceJoinOption Any]) DataFrame
Parameters
Note The coordinate values must be in the CoordSysConstantslongLatWGS84 coordinate system
Parameter Type Description
df2 DataFrame The dataframe to join to
df1Longitude Column The longitude value from the first dataframe
df1Latitude Column The latitude value from the first dataframe
df2Longitude Column The longitude value from the second dataframe
df2Latitude Column The latitude value from the second dataframe
maxDistance Length The buffer length around point 1 to search for point 2
geohashPrecision Integer The geohash precision to be used for the primary join Value must be between 1 and 12 The higher the number the more memory may be required
options Map Optional Options that add extra attributes to the result of the join
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 75
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Options
Key Type Description
DistanceColumnName String Adds a column to the result dataframe that contains the distance calculated
Return Values
Return Type Description
DataFrame The dataframe that is the result of the join
Examples
This example returns a dataframe that is the result of a join where points from a second dataframe are located within a 5 mile buffer around each point in the first dataframe
val searchRadius = 05 val distanceUnit = mi val distance = new commapinfomidevunitLength(searchRadiustoDoublecommapinfomidevunitLinearUnitgetFromMapInfoCode(distanceUnit))
val resultDF = baseDFjoinByDistance(joinDF col(longitude) col(latitude) col(lon)col(lat) distance 7)
Example showing options set
val distance = new Length(05 LinearUnitgetFromMapInfoCode(mi))
val resultDF = baseDF(joinDF baseDF(Longitude) baseDF(Latitude) joinDF(Lon)joinDF(Lat) distance 7 Map(DistanceColumnName -gt outputDistance))
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 76
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Location Intelligence Jar for Spatial Operations
If you want to use spatial operations in Spark a Location Intelligence jar file (LI jar) is available which supports compiling and running location intelligence programs
The LI jar also includes the Download Manager API which provides the capability to manage downloading reference data programmatically The reference data is typically kept in S3 or on HDFS and will get downloaded by the API when you call download() In addition the Download Manager can set permissions in the case where you have multiple users interacting with the data
bull For more information about the Download Manager API see the Javadocs located in pblisoftwarespark2sdkjavadocsdownloadmanager
There are several options for deploying the LI jar refer to the following table to proceed according to your use case
Application Do this
Spark job Installing and setting up the Location Intelligence Jar for a Spark Job on page 77
Zeppelin notebook Integrating the Location Intelligence Jar with Zeppelin on page 78
Hue notebook Integrating the Location Intelligence Jar with Hue on page 79
Installing and setting up the Location Intelligence Jar for a Spark Job
Perform the following steps
1 Deploy the spectrum-bigdata-li-sdk-spark2-versionjar to the Hadoop cluster 2 Start the Spark job using the following command
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
spark-submit --class comexamplesparkappMyDriver--master yarn --deploy-mode cluster--jars pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar pathtocustomdriverjar driverParameter1 driverParameter2
where
bull --jars The path to the jar file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 77
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Integrating the Location Intelligence Jar with Zeppelin
The Location Intelligence jar can be loaded and referenced in a Zeppelin notebook as an external library
Perform the following steps to use the LI jar in Zeppelin
1 On the navigation bar using the Settings drop-down menu and click Interpreter The Interpreters page displays
2 Go to the Spark2 section click Edit The editing view displays
3 Go to the Dependencies section Enter the full local path to the jar in the artifact field
Note This example assumes the product is installed to pblisoftware as described in Installing the SDK on page 8
pblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
4 Click Save This will restart the interpreter with the loaded library
5 To return to your notebook click the Notebook drop-down menu and select your notebook The LI jar is now available for all notebooks
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 78
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Spatial
Integrating the Location Intelligence Jar with Hue
The Location Intelligence jar can be loaded and referenced in a Hue notebook as an external library
Prerequisite To use the LI jar in an interactive manner you will need to use the Scala or Java interpreter which requires Livy
Proceed according to your platform
1 Cloudera - start here a) In Cloudera Manager navigate to Hue -gt Configuration b) Search for this property
Hue Service Advanced Configuration Snippet (Safety Valve) forhue_safety_valveini
c) Use the following 3 lines to set the propertys value
[desktop]app_blacklist=
use_default_configuration=true
d) Click Save Continue to step 3
2 EMR - start here a) On the Master Node open the file etchueconfhueini b) Uncomment and change the following line from
use_default_configuration=false
to
use_default_configuration=true
c) Save the file Continue to step 3
3 Restart Hive 4 Open the Hue editor 5 Select one of the query editors (Scala or Java) 6 Establish a session This can be accomplished by running any operation in the editor executing
1+1 will suffice 7 When the query completes click the gears icon button (in some versions of Hue this button
becomes visible after you click the vertical ellipsis button) 8 In the session dialog use the Add Property drop-down menu and select Jars 9 In the text box specify the location in HDFS where your Routing jar is located For example
hdfspblisoftwarespark2sdklibspectrum-bigdata-li-sdk-spark2-versionjar
10 Click Recreate This will recreate the session 11 To return to your notebook click any of the arrow icons or the gears button again This will close
any open dialogs and add the LI jar to the session that was just created
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 79
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
3 - Appendix
In this section
PGD Builder 81 Download Permissions 83 Operators and Syntax Delimiters 84 Copyright 87
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
PGD Builder
The PGD Builder is a command-line utility that builds a specialized prepared geometry index for TAB files A PGD index built for a TAB file improves the performance of the LocalSearchNearest UDTF when searching data that is based on polygons This tool is included in the utilities directory of the spectrum-bigdata-locationintelligence-ltversiongtzip distribution file Because environments and use cases vary widely we recommend you evaluate the PGD index against a sample to verify that it improves performance
Note A PGD index file is a supplemental file to the TAB file set It is 5-6 times larger than the MAP file for the TAB One PGD file is generated per TAB file except in the case of a seamless TAB which will have PGD files created for each sub-TAB
Also a PGD file will no longer be used by the system if you change the data in the TAB (that is if rows have been added or deleted or a geometry has been changed in the MAP portion of the TAB) In this case you must you must then regenerate the PGD for the updated TAB file
Building an Index with the PGD Builder
Description
This process generates an index (a PGD or prepared geometry file) for a single TAB file It creates one PGD file for each TAB file except in the case of a seamless TAB for which it creates a PGD file for each sub-TAB in the seamless TAB
Usage
PGDBuilder -f ltfilegt [-p ltparallelgt]
Parameters
Parameter Required Description
-f ltfilegt yes Path and file name for the TAB file for which you are generating a PGD file
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 81
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Parameter Required Description
-p ltparallelgt no The maximum number of concurrent processes to run The default is the number of CPUs available on the machine
When generating PGD files for a seamless TAB one thread is used for each sub-TAB and these threads are run concurrently In this case consider setting a lower number to prevent performance issues with your machine
Examples
This request will generate a PGD file for the uktab file
PGDBuilder -f Cdatauktab
This request will generate a PGD file for a seamless TAB file limited to 4 concurrent threads
PGDBuilder -f Cseamlesstab -p 4
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 82
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Download Permissions
Setting the download permissions allows multiple services to download and update the downloaded data when required You should have a common operating system group of which all the service users who need to download the data are part of For example if Hive and YARN jobs are required to download data and use the same download location then both the Hive and YARN operating system users should be part of a common operating system group The group of the download directory should be the common operating system group one that has Read Write and Execute (775) permissions for the owner and group
Your group should contain services and users that will run jobs in your cluster You may skip services you will not use or do not have installed Services include YARN Hive Zeppelin and Hue
You also should include all operating system users who will run jobs such as pbuser and ltmyOtherUsergt
1 Add the group
sudo groupadd pbdownloads
2 Add users to the group
sudo usermod -a -G pbdownloads hive sudo usermod -a -G pbdownloads yarn sudo usermod -a -G pbdownloads zeppelin sudo usermod -a -G pbdownloads hue sudo usermod -a -G pbdownloads pbuser sudo usermod -a -G pbdownloads ltmyOtherUsergt
3 Using a window where no job is running restart all the services whose operating system users were added to the new group
4 Using a window where no job is running restart the session of all the operating system users that were added to new group (for example pbuser)
5 Update the group to the common operating system group and update permissions to 775 for the download directory specified in pbdownloadlocation property
sudo chgrp pbdownloads pbdownloadssudo chmod 775 pbdownloads
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 83
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Operators and Syntax Delimiters
The supported operators and syntax delimiters in the MI SQL language are outlined below
Boolean Operators
Operator Definition
Attribute operators = lt gt = lt lt= gt gt=
Between Returns true if numeric or date values fall within a range Between is an inclusive operator
EnvelopesIntersects Returns true if the envelopes (MBRs) of the operands intersect
Contains Returns true if the first object contains all of the second object
Within Returns true if the first object is entirely inside the second object
ContainsCentroid Returns true if the first object contains the centroid of the second object
CentroidWithin Returns true if the first objects centroid is within the second object
Intersects Returns true if the two objects intersect at some point or if part of the first object is within the second object or if the first object contains part of the second object
In (List) Returns true if equals at least one of the values in the literal list or sub query
Like Returns true if the value can be compared to similar values using wildcard characters There are two wildcards used in conjunction with the Like operator Underscore _ and Percent The underscore represents a single number or character The percent sign represents zero one or multiple characters The symbols can be used in combination
AND Returns true if both conditions in the WHERE clause are true
OR Returns true if either the first or second condition is true
NOT Reverses the meaning of the logical operator with which it is used
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 84
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Arithmetic Operators
Operator Definition
+ Addition also concatenation operator NOTE String concatenation also uses amp
- Subtraction
Multiplication
Division
^ Exponentiation
Note Operator math on Time or DateTime is not supported You can add a number to a Date but not to a Time or DateTime
Syntax Delimiters
Delimiter Definition
( ) Expression delimiters
String constant delimiters See Quote Rules on page 85
Quoted identifier delimiters
_ Wildcard symbols represents zero one or more characters the_ (underscore) represents a single character
List items and function argument separators
Parameter names
Quote Rules
The MI SQL language uses standard quoting rules String literals (values) must be enclosed in single quotation marks (example) while identifiers (column names table names aliases and so on) should be enclosed in double quotation marks (example identifier) if necessary Identifiers only need to be quoted if the parsing logic is unable to correctly parse the identifier This would include identifiers that have spaces in their names or other special characters
Examples
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 85
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Identifiers or illegal characters (where they normally would not be allowed such as ) are surrounded by double-quotes
SELECT FROM SamplesNamedTablesUSA
String literals or values are surrounded by single-quotes
SELECT FROM SamplesNamedTablesUSA WHERE Country = Canada
In certain cases where a single-quote is within a string literal or value use a double-single-quote (two characters) In the following example the string literal Oharas is defined
SELECT FROM Streets WHERE Business = Oharas
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 86
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
Appendix
Copyright
Information in this document is subject to change without notice and does not represent a commitment on the part of the vendor or its representatives No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopying without the written permission of Pitney Bowes Software Inc 350 Jordan Road Troy New York 12180 copy 2020 Pitney Bowes Software Inc All rights reserved Location Intelligence APIs are trademarks of Pitney Bowes Software Inc All other marks and trademarks are property of their respective holders
Spectrumtrade Location Intelligence for Big Data 40 Spectrumtrade Location Intelligence for Big Data User Guide 87
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-
pitney bowes 0 3001 Summer Street
Stamford CT 06926-0700
USA
wwwpitneybowescom
copy 2020 Pitney Bowes Software Inc
All rights reserved
- Table of Contents
- Welcome
-
- What is Spectrumtrade Location Intelligence for Big Data
- Spectrumtrade Location Intelligence for Big Data Architecture
- System Requirements and Dependencies
-
- Spatial
-
- Installing the SDK
- Hive User-Defined Spatial Functions
-
- Setup
- WritableGeometry
- Geometry Functions
-
- Constructor Functions
-
- FromGeoJSON
- FromWKT
- FromWKB
- FromKML
- ST_Point
-
- Persistence Functions
-
- ToGeoJSON
- ToWKT
- ToWKB
- ToKML
-
- Predicate Functions
-
- Disjoint
- Intersects
- Overlaps
- Within
- IsNullGeometry
-
- Measurement Functions
-
- Area
- ClosestPoints
- Distance
- Length
- Perimeter
-
- Processing Functions
-
- Buffer
- ConvexHull
- Intersection
- Transform
- Union
-
- Observer Functions
-
- ST_X
- ST_XMax
- ST_XMin
- ST_Y
- ST_YMax
- ST_YMin
-
- Grid Functions
-
- GeoHashBoundary
- GeoHashID
- HexagonBoundary
- HexagonID
- SquareHashBoundary
- SquareHashID
-
- Search Functions
-
- LocalPointInPolygon
- LocalSearchNearest
-
- Spark
-
- Spark Jobs
-
- Hexagon Generator
-
- Hexagons
-
- Spark API
-
- JoinByDistance
-
- Location Intelligence Jar for Spatial Operations
-
- Installing and setting up the Location Intelligence Jar for a Spark Job
- Integrating the Location Intelligence Jar with Zeppelin
- Integrating the Location Intelligence Jar with Hue
-
- Appendix
-
- PGD Builder
-
- Building an Index with the PGD Builder
-
- Download Permissions
- Operators and Syntax Delimiters
-
- Quote Rules
-